Merge hydrant runners flatly for realtime queries.#15757
Merged
asdf2014 merged 3 commits intoapache:masterfrom Jan 25, 2024
Merged
Merge hydrant runners flatly for realtime queries.#15757asdf2014 merged 3 commits intoapache:masterfrom
asdf2014 merged 3 commits intoapache:masterfrom
Conversation
Prior to this patch, we have two layers of mergeRunners for realtime queries: one for each Sink (a logical segment) and one across all Sinks. This is done because to keep metrics and results grouped by Sink, given that each FireHydrant within a Sink has its own separate storage adapter. However, it costs extra memory usage due to the extra layer of materialization. This is especially pronounced for groupBy queries, which only use their merge buffers at the top layer of merging. The lower layer of merging materializes ResultRows directly into the heap, which can cause heap exhaustion if there are enough ResultRows. This patch changes to a single layer of merging when bySegment: false, just like Historicals. To accommodate that, segment metrics like query/segment/time are now per-FireHydrant instead of per-Sink. Two layers of merging are retained when bySegment: true. This isn't common, because it's typically only used when segment level caching is enabled on the Broker, which is off by default.
Contributor
Author
|
The metric change should be called out in docs and release notes. |
83 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Prior to this patch, we have two layers of mergeRunners for realtime queries: one for each Sink (a logical segment) and one across all Sinks. This is to keep metrics and results grouped by Sink, given that each FireHydrant within a Sink has its own separate storage adapter.
However, it costs extra memory usage due to the extra layer of materialization. This is especially pronounced for groupBy queries, which only use their merge buffers at the top layer of merging. The lower layer of merging materializes ResultRows directly into the heap, which can cause heap exhaustion if there are enough ResultRows.
This patch changes to a single layer of merging when
bySegment: false, just like Historicals. To accommodate that, segment metrics likequery/segment/timeare now per-FireHydrant instead of per-Sink.Two layers of merging are retained when
bySegment: true. This isn't common, because it's typically only used when segment level caching is enabled on the Broker, which is off by default.