Skip to content

Better groupBy error messages and docs around resource limits.#4162

Merged
fjy merged 3 commits intoapache:masterfrom
gianm:groupby-docs
Apr 13, 2017
Merged

Better groupBy error messages and docs around resource limits.#4162
fjy merged 3 commits intoapache:masterfrom
gianm:groupby-docs

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Apr 12, 2017

Improvement on #4046, based on complaints that it was hard to understand how to tune groupBy v2 for a particular query load.

@gianm gianm added this to the 0.10.1 milestone Apr 12, 2017
@gianm gianm requested review from himanshug and jon-wei April 12, 2017 00:17
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Apr 12, 2017

👍

Copy link
Copy Markdown
Member

@nishantmonu51 nishantmonu51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 , left some minor doc comments

Comment thread docs/content/querying/groupbyquery.md Outdated
- druid.query.groupBy.maxMergingDictionarySize: size of the on-heap dictionary used when grouping on strings, per query,
in bytes. Note that this is based on a rough estimate of the dictionary size, not the actual size.
- druid.query.groupBy.maxOnDiskStorage: amount of space on disk used for aggregation, per query, in bytes. By default,
this is 0, which means all aggregation is done in-memory.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'in-memory' refers to on-heap or offheap memory here ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's both. Clarified.

Comment thread docs/content/querying/groupbyquery.md Outdated
aggregation table limit, will fail with a "Resource limit exceeded" error describing the limit that was exceeded. If
maxOnDiskStorage is greater than 0, queries that exceed the in-memory limits will start using disk for aggregation.
Queries that then also exceed maxOnDiskStorage will fail with a "Resource limit exceeded" error indicating that they
ran out of disk space.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are both on-heap dictionary and off-heap aggregation table both flushed to disk ? can we clarify this in docs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified.

private static final AggregateResult DICTIONARY_FULL = AggregateResult.failure(
"Not enough dictionary space to execute this query. Try increasing "
+ "druid.query.groupBy.maxMergingDictionarySize or enable disk spilling by setting "
+ "druid.query.groupBy.maxOnDiskStorage to a positive number."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/positive/larger , user might already have a positive number in there

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error will only be seen by the user if maxOnDiskStorage is 0, so "positive" does make sense. If it's > 0 they'll see a different error, about disk space being full.

private static final AggregateResult HASHTABLE_FULL = AggregateResult.failure(
"Not enough aggregation table space to execute this query. Try increasing "
+ "druid.processing.buffer.sizeBytes or enable disk spilling by setting "
+ "druid.query.groupBy.maxOnDiskStorage to a positive number."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

larger

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@himanshug
Copy link
Copy Markdown
Contributor

👍 aside from minor doc comments

@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Apr 12, 2017

Re-pushed with some edits.

Copy link
Copy Markdown
Contributor

@jon-wei jon-wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, had one minor comment

In addition, groupBy v1 merges results on-heap, whereas groupBy v2 merges results off-heap. These factors mean that
memory tuning and resource limits behave differently between v1 and v2. In particular, due to this, some queries
that can complete successfully in one engine may exceed resource limits and fail with the other engine. See the
"Memory tuning and resource limits" section for more details.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe turn this into a link to that section

@fjy fjy merged commit b2954d5 into apache:master Apr 13, 2017
gianm added a commit to implydata/druid-public that referenced this pull request Apr 13, 2017
…e#4162)

* Better groupBy error messages and docs around resource limits.

* Fix BufferGrouper test from datasketches.

* Further clarify.
@gianm gianm deleted the groupby-docs branch September 23, 2022 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants