Skip to content

Split incoming requests by day and run them in parallel.#995

Closed
tomwilkie wants to merge 1 commit intocortexproject:masterfrom
grafana:parallelise-multi-day-queries
Closed

Split incoming requests by day and run them in parallel.#995
tomwilkie wants to merge 1 commit intocortexproject:masterfrom
grafana:parallelise-multi-day-queries

Conversation

@tomwilkie
Copy link
Contributor

@tomwilkie tomwilkie commented Sep 10, 2018

Continuation of the work proposed in https://docs.google.com/document/d/1lsvSkv0tiAMPQv-V8vI2LZ8f4i9JuTRsuPI_i-XcAqY/edit?usp=drive_web&ouid=103586900408483314805.

  • Generic code to parse incoming query_range requests, mutate them and round trip them.
  • Split queries along day boundaries, modulo step.
  • Run queries in parallel and combine their results.

Fixes #963, fixes #266

Need to merge https://github.com/weaveworks/common/pull/ first

Signed-off-by: Tom Wilkie tom.wilkie@gmail.com

@tomwilkie tomwilkie force-pushed the parallelise-multi-day-queries branch from 14e98f5 to ae29f0f Compare September 11, 2018 15:31
@tomwilkie tomwilkie changed the title [WIP] Split incoming requests by day and run them in parallel. Split incoming requests by day and run them in parallel. Sep 13, 2018
f.IntVar(&cfg.MaxOutstandingPerTenant, "querier.max-outstanding-requests-per-tenant", 100, "Maximum number of outstanding requests per tenant per frontend; requests beyond this error with HTTP 429.")
f.IntVar(&cfg.MaxRetries, "querier.max-retries-per-request", 5, "Maximum number of retries for a single request; beyon this, the downstream error is returned.")
f.IntVar(&cfg.MaxRetries, "querier.max-retries-per-request", 5, "Maximum number of retries for a single request; beyond this, the downstream error is returned.")
f.BoolVar(&cfg.SplitQueriesByDay, "querier.split-queries-by-day", false, "Split queries by day and execute in parallel.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why day? It's a nice time boundary, but I'm curious if there was reasoning here that makes it the off/on choice, versus being able to specify the range to split on. If someone was using periodic tables with a week range, is there a difference to splitting the query over that time boundary instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! The rows in the index are organised by day, and sub day queries pretty much have to read an entire days index entries anyway. Therefore might as well parallelise by day IMO.

We're running this in prod now and it actually looks like for high-cardinality queries where the execution of the PromQL is the dominant latency, sub-day parallelism might be worth while. In the future we should probably make this tuneable.

- Generic code to parse incoming query_range requests, mutate them and round trip them.
- Split queries along day boundaries, modulo step.
- Run queries in parallel and combine their results.
- Ensure we propagate org ids correctly; add e2e tests.
- Take care to ensure we propagate trace IDs correctly; involved updating weaveworks/common.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
@tomwilkie tomwilkie force-pushed the parallelise-multi-day-queries branch from af16d46 to 3502fb4 Compare September 24, 2018 10:13
@tomwilkie
Copy link
Contributor Author

Going to roll this into #1029, as there are review comments there.

@tomwilkie tomwilkie closed this Sep 26, 2018
@tomwilkie tomwilkie deleted the parallelise-multi-day-queries branch October 9, 2018 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallelise queries across "workers" by time range, for long queries 500s while querying longish ranges

2 participants