Skip to content

Introduce defaultOnDiskStorage config for Group By#12833

Merged
suneet-s merged 9 commits intoapache:masterfrom
capistrant:defaultOnDiskStorage
Aug 12, 2022
Merged

Introduce defaultOnDiskStorage config for Group By#12833
suneet-s merged 9 commits intoapache:masterfrom
capistrant:defaultOnDiskStorage

Conversation

@capistrant
Copy link
Copy Markdown
Contributor

@capistrant capistrant commented Jul 28, 2022

Description

This config allows us the cluster operator to set an allowable maxOnDiskStorage value while also providing a default to prevent all queries from using that maximum by default if they don't override to a lower value on the client side via context. This makes sense because an operator may want to allow clients to opt in to a value between 0 and maxOnDiskStorage for their query, while not wanting all queries to use on disk storage by default.


Key changed/added classes in this PR
  • GroupByQueryConfig

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

private long maxOnDiskStorage = 0L;

@JsonProperty
private long defaultOnDiskStorage = Integer.MAX_VALUE;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both maxOnDiskStorage and this new config can be changed to type of HumanReadableBytes

Comment thread docs/configuration/index.md Outdated
|`druid.query.groupBy.maxSelectorDictionarySize`|Maximum amount of heap space (approximately) to use for per-segment string dictionaries. See [groupBy memory tuning and resource limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for details.|100000000|
|`druid.query.groupBy.maxMergingDictionarySize`|Maximum amount of heap space (approximately) to use for per-query string dictionaries. When the dictionary exceeds this size, a spill to disk will be triggered. See [groupBy memory tuning and resource limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for details.|100000000|
|`druid.query.groupBy.maxOnDiskStorage`|Maximum amount of disk space to use, per-query, for spilling result sets to disk when either the merging buffer or the dictionary fills up. Queries that exceed this limit will fail. Set to zero to disable disk spilling.|0 (disabled)|
|`druid.query.groupBy.defaultOnDiskStorage`|Default amount of disk space to use, per-query, for spilling the result sets to disk when either the merging buffer or the dictionary fills up. Set to zero to disable disk spilling for queries who don't override `maxOnDiskStorage` in their context.|`druid.query.groupBy.maxOnDiskStorage`|
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`druid.query.groupBy.defaultOnDiskStorage`|Default amount of disk space to use, per-query, for spilling the result sets to disk when either the merging buffer or the dictionary fills up. Set to zero to disable disk spilling for queries who don't override `maxOnDiskStorage` in their context.|`druid.query.groupBy.maxOnDiskStorage`|
|`druid.query.groupBy.defaultOnDiskStorage`|Default amount of disk space to use, per-query, for spilling the result sets to disk when either the merging buffer or the dictionary fills up. Set to zero to disable disk spilling for queries which don't override `maxOnDiskStorage` in their context.|`druid.query.groupBy.maxOnDiskStorage`|

// choose a default value lower than the max allowed when the context key is missing in the client query.
newConfig.maxOnDiskStorage = Math.min(
((Number) query.getContextValue(CTX_KEY_MAX_ON_DISK_STORAGE, getMaxOnDiskStorage())).longValue(),
((Number) query.getContextValue(CTX_KEY_MAX_ON_DISK_STORAGE, getDefaultOnDiskStorage())).longValue(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both max and default config are changed to type of HumanReadableBytes, I think we need a new interface(for example getHumanReadableContextValue) on QueryContext to support values in these format:

  1. number
  2. number in string format (in case of Bad error message on incorrect context value #12760)
  3. human readable format

HumanReadableBytes.parse can handle the latter two cases.

private static final long MAX_AUTOMATIC_DICTIONARY_SIZE = 1_000_000_000;



Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accidental changes?

@FrankChen021
Copy link
Copy Markdown
Member

It makes sense to me. Some minor comments are left.

private long maxOnDiskStorage = 0L;

@JsonProperty
private long defaultOnDiskStorage = Integer.MAX_VALUE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private long defaultOnDiskStorage = Integer.MAX_VALUE;
private long defaultOnDiskStorage = -1;

Since technically Integer.MAX_VALUE seems like a legit config option since defaultOnDiskStorage is a long

*/
public long getDefaultOnDiskStorage()
{
return defaultOnDiskStorage == Integer.MAX_VALUE ? getMaxOnDiskStorage() : defaultOnDiskStorage;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return defaultOnDiskStorage == Integer.MAX_VALUE ? getMaxOnDiskStorage() : defaultOnDiskStorage;
return defaultOnDiskStorage < 0 ? getMaxOnDiskStorage() : defaultOnDiskStorage;

Comment on lines +395 to +396

logger.debug(newConfig.toString());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove or add something in the log message that explains what the debug message is

Suggested change
logger.debug(newConfig.toString());

@suneet-s
Copy link
Copy Markdown
Contributor

Nice! I was just looking into something like this for a similar use case! :)

Copy link
Copy Markdown
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes LGTM. Thank you @capistrant


boolean getContextBoolean(String key, boolean defaultValue);

HumanReadableBytes getContextHumanReadableBytes(String key, HumanReadableBytes defaultValue);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Query is marked as an ExtensionPoint I think we should introduce a default implementation of this method to help users who may have extended this class upgrade to this version of Druid without having to make a modification to their extension. The default implementation could be something like

default HumanReadableBytes getContextHumanReadableBytes(String key, HumanReadableBytes defaultValue) {
  if (getQueryContext != null) {
    return getQueryContext().getAsHumanReadableBytes(key, defaultValue);
  }
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this default basically word for word. Added a return of the defaultValue if the context doesn't exist

Copy link
Copy Markdown
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great! Just the one comment about the new method in the Query interface

@abhishekagarwal87
Copy link
Copy Markdown
Contributor

@capistrant - thank you for this feature. is this PR good to be merged? Its green :)

@capistrant
Copy link
Copy Markdown
Contributor Author

@capistrant - thank you for this feature. is this PR good to be merged? Its green :)

@abhishekagarwal87 I need to address one more thing that @suneet-s called out related to breaking changes of an extension point. Will hopefully have it done today

Copy link
Copy Markdown
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @capistrant !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants