Executing child queries on queriers#1730
Closed
owen-d wants to merge 97 commits intocortexproject:masterfrom
Closed
Executing child queries on queriers#1730owen-d wants to merge 97 commits intocortexproject:masterfrom
owen-d wants to merge 97 commits intocortexproject:masterfrom
Conversation
89ca6fd to
11bc349
Compare
cyriltovena
reviewed
Oct 19, 2019
cyriltovena
reviewed
Oct 21, 2019
cyriltovena
reviewed
Oct 21, 2019
cyriltovena
reviewed
Oct 21, 2019
cyriltovena
reviewed
Oct 21, 2019
97713c4 to
2fc0914
Compare
ba5d4da to
440d515
Compare
edfb63d to
68295bd
Compare
owen-d
commented
Oct 30, 2019
pkg/chunk/series_store.go
Outdated
Contributor
Author
There was a problem hiding this comment.
probably unnecessary to redeclare lookupSeriesByMetricNameMatchers as a spanlogger with this name is created above
owen-d
commented
Oct 30, 2019
pkg/chunk/series_store.go
Outdated
Contributor
Author
There was a problem hiding this comment.
we'd still need to filter the resulting seriesIDs here. I think it can more be more succinctly expressed by calculating the shard/splicing shard labels then running the len(matchers) == 0 logic once.
d92db3d to
cf022c8
Compare
cf022c8 to
7144afe
Compare
0d2444f to
dd72c85
Compare
3acf0fe to
582788f
Compare
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
582788f to
b3efc15
Compare
Contributor
Author
|
deprecated in favor of #1878 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Disclaimer
This is not finished, but I'd like feedback on the design.
What
These changes aim to introduce a path towards further distributing queries. Currently Cortex dispatches queries to backend queriers, but as the throughput of metrics increase, running entire queries on a single querier can become a bottleneck.
Prior Art
Sharding
The v10 schema introduced a shard factor for data which spreads series across n shards. This is a prerequisite for allowing us to query data from these shards in parallel.
Problem
Although the v10 schema change introduces a shard factor, all shards must still be processed on a single querier. This is compounded by the fact that query ASTs are not broken down before execution. Therefore, while sharding lets us split up data at it's source location, we still re-aggregate it and process it all in one place. The two goals of this PR are to fix these, by
Details
Mapping ASTs
Firstly, we introduce a package called
astmapperwith the interface:This interface is used to map ASTs into different ASTs. We use this to turn i.e.
sum by (foo) (rate(bar{baz=”blip”}[1m]))->This works largely because summing the sub-sums is equivalent. The principle should hold true for any merge operation that is associative, but sums are the initial focus.
Hijacking Queryable + embedding queries
Queries are executed in unison by a
promql.Engineand astorage.Queryable. Since the Engine is a concrete type, it's implementation is locked. This means that the remaining option is to hijack the queryable to dispatch child queries. However, the queryable is only called to retrieve vector and matrix selector nodes -- all aggregations, functions, etc are handled by the Engine itself. Therefore, in order to regain control of these entire subtrees, we must encode them in vector/matrix selectors. This is done by stringifying an entire subtree and replacing the node with a vector or matrix selector. Currently queries are hex-encoded, but this could be something more human-friendly.Using our previous example with a shard-factor of 3,
sum by(foo) (rate(bar1{baz="blip"}[1m]))is turned intoThe queryable implementation will look for these
__embedded_query__and__cortex_query__labels and upon finding them, shell out to a downstream querier with the encoded query. The Engine will then reassemble the resulting SeriesSet, applying parent operations and merging multiple child queries as necessary.Remaining Work
label grouping should include shard labels
sum by (foo)when sharded will return a vector with labels that only includefoo=<value>. Due to the merge behavior of the union operator, (or), this would result in discarding data in later vectors which have the same label value for foo. Therefore, we need to turn these into:Improve AST mapping
Splitting non-sum queries
We need a way to handle shard splitting for non-parallelized queries. In the sum example, we introduce
__cortex_shard__labels in the AST and parallelize them. We need to ensure we're querying the right shards for non-sum queries as well. This may be handled either in the AST (ideal as it isolates logic) or in the backend (i.e. a backend could detect which queries do not have__cortex_shard__labels and fan-out/collect them at that level)Better logic for determining which subtrees to execute
implementedthis optimization will be left for a later prCurrently, parallelizable sums will be executed on a downstream querier, but non-sum subtrees will not. In the example
rate(<selector>[5m]), the selector matrix would be dispatched to a querier, but the rate would be computed over the entire series by the frontend. This is certainly suboptimal and I'll be adding more logic to correct this.Custom impls for specific functions
implementedthis optimization will be left for a later prAs an example, the
averagefunction is not associative and is thus difficult to combine across shards without knowing the number of data points the average was calculated over. However, average may be remapped to be asum(count), which is associative and parallelizes nicely.Injecting shard configurations into the frontend
This could be done similarly to how
PeriodConfigsare used to create composite stores (i.e. as a configuration in a yaml file) or some other method.Nice to haves/Ideal Solution/etc
promql.Engineso that evaluation wouldn't need to be handled in thestorage.Queryableinterface. It would remove the need to splice in selectors with encoded queries and allow for a cleaner, less obfuscated implementation. This may be a significant lift, though. As an example, the interface may look something like