Skip to content

Implement segment range threshold for automatic query prioritization#17009

Merged
abhishekagarwal87 merged 7 commits intoapache:masterfrom
aho135:segmentRangeThreshold
Sep 10, 2024
Merged

Implement segment range threshold for automatic query prioritization#17009
abhishekagarwal87 merged 7 commits intoapache:masterfrom
aho135:segmentRangeThreshold

Conversation

@aho135
Copy link
Copy Markdown
Contributor

@aho135 aho135 commented Sep 5, 2024

Description

Implements threshold based automatic query prioritization using the time period of the actual segments scanned. This differs from the current implementation of durationThreshold which uses the duration in the user supplied query. There are some usability constraints with using durationThreshold from the user supplied query, especially when using SQL. For example, if a client does not explicitly specify both start and end timestamps then the duration is extremely large and will always exceed the configured durationThreshold. This is one example interval from a query that specifies no end timestamp:
"interval":["2024-08-30T08:05:41.944Z/146140482-04-24T15:36:27.903Z"]. This interval is generated from a query like SELECT * FROM table WHERE __time > CURRENT_TIMESTAMP - INTERVAL '15' HOUR. Using the time period of the actual segments scanned allows proper prioritization without explicitly having to specify start and end timestamps. This PR adds onto #9493

Fixed the bug ...

Renamed the class ...

Added a forbidden-apis entry ...

Release note

Automatic query prioritization based on the period of the actual segments scanned in a query.


Key changed/added classes in this PR
  • ThresholdBasedQueryPrioritizationStrategy
  • ThresholdBasedQueryPrioritizationStrategyTest

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@aho135
Copy link
Copy Markdown
Contributor Author

aho135 commented Sep 5, 2024

@clintropolis Would like to hear your thoughts on this since you originally authored #9493

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this seems a lot more useful than durationThreshold since it only penalizes large intervals if there is actually data present 👍

I'd actually be in favor of just changing the behavior of durationThreshold to use the logic in this PR instead of its current behavior and we can just call it out in the release notes, though i'm fine to add this as a new option too, up to you.

Comment thread docs/configuration/index.md Outdated
|`druid.query.scheduler.prioritization.segmentCountThreshold`|Number threshold for maximum number of segments that can take part in a query before its priority is automatically adjusted.|none|
|`druid.query.scheduler.prioritization.adjustment`|Amount to reduce the priority of queries which cross any threshold.|none|
|Property| Description |Default|
|--------|------------------------------------------------------------------------------------------------------------------------------|-------|
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please disable the auto table formatting, i don't think we use it in most places and it isn't really needed for the website rendering

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @clintropolis I'm thinking it would be better as a separate option. It may cause confusion if someone specifies a large time range in their query and it does not get de-prioritized. And I can imagine some operators may want to just penalize any query with a large time range regardless of whether data is present

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on disabling the auto table formatting? I'm not sure how to do that

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah , i guess i was assuming you were using some kind of tool that added all of the extra formatting to the markdown table that caused all of these changes when you added the new entry (I believe intellij has such an option enabled by default). Anyway, I was just suggesting to minimize the lines changed to only the entry for the new option, and leave out the padding and extra --- on the markdown table

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh thanks for explaining. I think intellij did that, but I just pushed a change to fix the formatting!

Copy link
Copy Markdown
Contributor

@abhishekrb19 abhishekrb19 Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, the FORMAT_TABLES setting controls markdown auto-formatting. It was disabled here, so importing the code style file in your IDE should stop Intellij from auto-formatting markdown tables

@abhishekagarwal87 abhishekagarwal87 merged commit 2427972 into apache:master Sep 10, 2024
@adarshsanjeev adarshsanjeev added this to the 32.0.0 milestone Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants