Skip to content

RunWorkOrder: Account for two simultaneous statistics collectors.#17216

Merged
abhishekagarwal87 merged 4 commits intoapache:masterfrom
gianm:msq-mem-div2
Oct 3, 2024
Merged

RunWorkOrder: Account for two simultaneous statistics collectors.#17216
abhishekagarwal87 merged 4 commits intoapache:masterfrom
gianm:msq-mem-div2

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Oct 2, 2024

As a follow up to #17057, divide the amount of partitionStatsMemory by two, to account for the fact that there are at some times going to be two copies of the full collector. First there will be one for processors and one for the accumulated collector. Then, after the processor ones are GCed, a snapshot of the accumulated collector will be created.

Also includes an optimization to "addAll" for the two KeyCollectors, for the case where we're adding into an empty collector. This is always going to happen once per stage due to the "withAccumulation" call.

This fixes an OOME @vogievetsky noticed during local testing with bin/start-druid.

As a follow up to apache#17057, divide the amount of partitionStatsMemory
by two, to account for the fact that there are at some times going to
be two copies of the full collector. First there will be one for processors
and one for the accumulated collector. Then, after the processor ones are
GCed, a snapshot of the accumulated collector will be created.

Also includes an optimization to "addAll" for the two KeyCollectors,
for the case where we're adding into an empty collector. This is always
going to happen once per stage due to the "withAccumulation" call.
@gianm gianm added the Bug label Oct 2, 2024
@gianm gianm added this to the 31.0.0 milestone Oct 2, 2024
@github-actions github-actions Bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Oct 2, 2024
@vogievetsky
Copy link
Copy Markdown
Contributor

I ran into this bug :-p

@abhishekagarwal87 abhishekagarwal87 merged commit 316f8c8 into apache:master Oct 3, 2024
kfaraz pushed a commit to kfaraz/druid that referenced this pull request Oct 4, 2024
…ache#17216)

* RunWorkOrder: Account for two simultaneous statistics collectors.

As a follow up to apache#17057, divide the amount of partitionStatsMemory
by two, to account for the fact that there are at some times going to
be two copies of the full collector. First there will be one for processors
and one for the accumulated collector. Then, after the processor ones are
GCed, a snapshot of the accumulated collector will be created.

Also includes an optimization to "addAll" for the two KeyCollectors,
for the case where we're adding into an empty collector. This is always
going to happen once per stage due to the "withAccumulation" call.

* Fix missing variable.

* Don't divide by numProcessingThreads twice.

* Fix test.
kfaraz added a commit that referenced this pull request Oct 5, 2024
…) (#17251)

* SQL: Use regular filters for time filtering in subqueries. (#17173)
* RunWorkOrder: Account for two simultaneous statistics collectors. (#17216)
* DartTableInputSpecSlicer: Fix for TLS workers. (#17224)
* Upgrade avro - minor version (#17230)
* SuperSorter: Don't set allDone if it's already set. (#17238)
* Decoupled planning: improve join support (#17039)
---------
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
Co-authored-by: Zoltan Haindrich <kirk@rxd.hu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants