Implement segment range threshold for automatic query prioritization#17009
Implement segment range threshold for automatic query prioritization#17009abhishekagarwal87 merged 7 commits intoapache:masterfrom
Conversation
|
@clintropolis Would like to hear your thoughts on this since you originally authored #9493 |
clintropolis
left a comment
There was a problem hiding this comment.
nice, this seems a lot more useful than durationThreshold since it only penalizes large intervals if there is actually data present 👍
I'd actually be in favor of just changing the behavior of durationThreshold to use the logic in this PR instead of its current behavior and we can just call it out in the release notes, though i'm fine to add this as a new option too, up to you.
| |`druid.query.scheduler.prioritization.segmentCountThreshold`|Number threshold for maximum number of segments that can take part in a query before its priority is automatically adjusted.|none| | ||
| |`druid.query.scheduler.prioritization.adjustment`|Amount to reduce the priority of queries which cross any threshold.|none| | ||
| |Property| Description |Default| | ||
| |--------|------------------------------------------------------------------------------------------------------------------------------|-------| |
There was a problem hiding this comment.
nit: please disable the auto table formatting, i don't think we use it in most places and it isn't really needed for the website rendering
There was a problem hiding this comment.
Thanks for the review @clintropolis I'm thinking it would be better as a separate option. It may cause confusion if someone specifies a large time range in their query and it does not get de-prioritized. And I can imagine some operators may want to just penalize any query with a large time range regardless of whether data is present
There was a problem hiding this comment.
Could you elaborate on disabling the auto table formatting? I'm not sure how to do that
There was a problem hiding this comment.
ah , i guess i was assuming you were using some kind of tool that added all of the extra formatting to the markdown table that caused all of these changes when you added the new entry (I believe intellij has such an option enabled by default). Anyway, I was just suggesting to minimize the lines changed to only the entry for the new option, and leave out the padding and extra --- on the markdown table
There was a problem hiding this comment.
Ahh thanks for explaining. I think intellij did that, but I just pushed a change to fix the formatting!
Description
Implements threshold based automatic query prioritization using the time period of the actual segments scanned. This differs from the current implementation of durationThreshold which uses the duration in the user supplied query. There are some usability constraints with using durationThreshold from the user supplied query, especially when using SQL. For example, if a client does not explicitly specify both start and end timestamps then the duration is extremely large and will always exceed the configured durationThreshold. This is one example interval from a query that specifies no end timestamp:
"interval":["2024-08-30T08:05:41.944Z/146140482-04-24T15:36:27.903Z"]. This interval is generated from a query like SELECT * FROM table WHERE __time > CURRENT_TIMESTAMP - INTERVAL '15' HOUR. Using the time period of the actual segments scanned allows proper prioritization without explicitly having to specify start and end timestamps. This PR adds onto #9493
Fixed the bug ...
Renamed the class ...
Added a forbidden-apis entry ...
Release note
Automatic query prioritization based on the period of the actual segments scanned in a query.
Key changed/added classes in this PR
ThresholdBasedQueryPrioritizationStrategyThresholdBasedQueryPrioritizationStrategyTestThis PR has: