Introduce defaultOnDiskStorage config for Group By#12833
Introduce defaultOnDiskStorage config for Group By#12833suneet-s merged 9 commits intoapache:masterfrom
Conversation
| private long maxOnDiskStorage = 0L; | ||
|
|
||
| @JsonProperty | ||
| private long defaultOnDiskStorage = Integer.MAX_VALUE; |
There was a problem hiding this comment.
Both maxOnDiskStorage and this new config can be changed to type of HumanReadableBytes
| |`druid.query.groupBy.maxSelectorDictionarySize`|Maximum amount of heap space (approximately) to use for per-segment string dictionaries. See [groupBy memory tuning and resource limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for details.|100000000| | ||
| |`druid.query.groupBy.maxMergingDictionarySize`|Maximum amount of heap space (approximately) to use for per-query string dictionaries. When the dictionary exceeds this size, a spill to disk will be triggered. See [groupBy memory tuning and resource limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for details.|100000000| | ||
| |`druid.query.groupBy.maxOnDiskStorage`|Maximum amount of disk space to use, per-query, for spilling result sets to disk when either the merging buffer or the dictionary fills up. Queries that exceed this limit will fail. Set to zero to disable disk spilling.|0 (disabled)| | ||
| |`druid.query.groupBy.defaultOnDiskStorage`|Default amount of disk space to use, per-query, for spilling the result sets to disk when either the merging buffer or the dictionary fills up. Set to zero to disable disk spilling for queries who don't override `maxOnDiskStorage` in their context.|`druid.query.groupBy.maxOnDiskStorage`| |
There was a problem hiding this comment.
| |`druid.query.groupBy.defaultOnDiskStorage`|Default amount of disk space to use, per-query, for spilling the result sets to disk when either the merging buffer or the dictionary fills up. Set to zero to disable disk spilling for queries who don't override `maxOnDiskStorage` in their context.|`druid.query.groupBy.maxOnDiskStorage`| | |
| |`druid.query.groupBy.defaultOnDiskStorage`|Default amount of disk space to use, per-query, for spilling the result sets to disk when either the merging buffer or the dictionary fills up. Set to zero to disable disk spilling for queries which don't override `maxOnDiskStorage` in their context.|`druid.query.groupBy.maxOnDiskStorage`| |
| // choose a default value lower than the max allowed when the context key is missing in the client query. | ||
| newConfig.maxOnDiskStorage = Math.min( | ||
| ((Number) query.getContextValue(CTX_KEY_MAX_ON_DISK_STORAGE, getMaxOnDiskStorage())).longValue(), | ||
| ((Number) query.getContextValue(CTX_KEY_MAX_ON_DISK_STORAGE, getDefaultOnDiskStorage())).longValue(), |
There was a problem hiding this comment.
If both max and default config are changed to type of HumanReadableBytes, I think we need a new interface(for example getHumanReadableContextValue) on QueryContext to support values in these format:
- number
- number in string format (in case of Bad error message on incorrect context value #12760)
- human readable format
HumanReadableBytes.parse can handle the latter two cases.
| private static final long MAX_AUTOMATIC_DICTIONARY_SIZE = 1_000_000_000; | ||
|
|
||
|
|
||
|
|
|
It makes sense to me. Some minor comments are left. |
| private long maxOnDiskStorage = 0L; | ||
|
|
||
| @JsonProperty | ||
| private long defaultOnDiskStorage = Integer.MAX_VALUE; |
There was a problem hiding this comment.
| private long defaultOnDiskStorage = Integer.MAX_VALUE; | |
| private long defaultOnDiskStorage = -1; |
Since technically Integer.MAX_VALUE seems like a legit config option since defaultOnDiskStorage is a long
| */ | ||
| public long getDefaultOnDiskStorage() | ||
| { | ||
| return defaultOnDiskStorage == Integer.MAX_VALUE ? getMaxOnDiskStorage() : defaultOnDiskStorage; |
There was a problem hiding this comment.
| return defaultOnDiskStorage == Integer.MAX_VALUE ? getMaxOnDiskStorage() : defaultOnDiskStorage; | |
| return defaultOnDiskStorage < 0 ? getMaxOnDiskStorage() : defaultOnDiskStorage; |
|
|
||
| logger.debug(newConfig.toString()); |
There was a problem hiding this comment.
nit: remove or add something in the log message that explains what the debug message is
| logger.debug(newConfig.toString()); |
|
Nice! I was just looking into something like this for a similar use case! :) |
FrankChen021
left a comment
There was a problem hiding this comment.
All changes LGTM. Thank you @capistrant
|
|
||
| boolean getContextBoolean(String key, boolean defaultValue); | ||
|
|
||
| HumanReadableBytes getContextHumanReadableBytes(String key, HumanReadableBytes defaultValue); |
There was a problem hiding this comment.
Since Query is marked as an ExtensionPoint I think we should introduce a default implementation of this method to help users who may have extended this class upgrade to this version of Druid without having to make a modification to their extension. The default implementation could be something like
default HumanReadableBytes getContextHumanReadableBytes(String key, HumanReadableBytes defaultValue) {
if (getQueryContext != null) {
return getQueryContext().getAsHumanReadableBytes(key, defaultValue);
}
}
There was a problem hiding this comment.
Added this default basically word for word. Added a return of the defaultValue if the context doesn't exist
suneet-s
left a comment
There was a problem hiding this comment.
Overall looks great! Just the one comment about the new method in the Query interface
|
@capistrant - thank you for this feature. is this PR good to be merged? Its green :) |
@abhishekagarwal87 I need to address one more thing that @suneet-s called out related to breaking changes of an extension point. Will hopefully have it done today |
suneet-s
left a comment
There was a problem hiding this comment.
LGTM
Thanks @capistrant !
Description
This config allows us the cluster operator to set an allowable maxOnDiskStorage value while also providing a default to prevent all queries from using that maximum by default if they don't override to a lower value on the client side via context. This makes sense because an operator may want to allow clients to opt in to a value between 0 and maxOnDiskStorage for their query, while not wanting all queries to use on disk storage by default.
Key changed/added classes in this PR
This PR has: