Limit to 3 pipelines per node per source#5792
Merged
fulmicoton-dd merged 2 commits intoimpr-shard-collocationfrom Jun 27, 2025
Merged
Limit to 3 pipelines per node per source#5792fulmicoton-dd merged 2 commits intoimpr-shard-collocationfrom
fulmicoton-dd merged 2 commits intoimpr-shard-collocationfrom
Conversation
Collaborator
|
why 3? |
Collaborator
Author
It's a ratio proposed by Paul here. The rational is that if you you start saturating the systems, merges being 3x faster than indexing, with this ratio merges would be able to keep up. |
Collaborator
Author
|
There are 2 issues with the logic so far:
EDIT: 2) is solved in #5808 |
c065f94 to
1b57434
Compare
Collaborator
Author
|
Actually this is wrong for now, I thought shards in the simplified problem were indexing pipelines in the physical plan, but it's not true. Shards in the simplified problem are physical shards, and they are mapped to pipelines in EDIT: solved now |
2c62834 to
0648c42
Compare
e6c4ebd to
761073e
Compare
rdettai
commented
Jun 12, 2025
Comment on lines
+359
to
+414
| // To ensure that merges can keep up, we try not to assign more than 3 | ||
| // pipelines per indexer for a source (except if there aren't enough nodes). | ||
| let target_limit_num_shards_per_indexer_per_source = | ||
| 3 * MAX_LOAD_PER_PIPELINE.cpu_millis() / source.load_per_shard.get(); |
Collaborator
Author
There was a problem hiding this comment.
This is creating some undesired coupling with the rest of the code:
- we rely on convert_scheduling_solution_to_physical_plan to use the same MAX_LOAD_PER_PIPELINE to create the right amount of pipelines
- we rely on the default load_per_pipeline for non-ingest sources (e.g Kafka) to also use MAX_LOAD_PER_PIPELINE.
b62e363 to
2aa62ec
Compare
d1003c7 to
38cd299
Compare
1a870a8 to
9513e00
Compare
fulmicoton-dd
approved these changes
Jun 27, 2025
rdettai
added a commit
that referenced
this pull request
Jun 30, 2025
* Add limit of three pipelines per node * Small code simplification
rdettai
added a commit
that referenced
this pull request
Jul 10, 2025
* Add limit of three pipelines per node * Small code simplification
otetard
pushed a commit
to SekoiaLab/quickwit
that referenced
this pull request
Jul 18, 2025
* Add limit of three pipelines per node * Small code simplification
rdettai-sk
pushed a commit
to SekoiaLab/quickwit
that referenced
this pull request
Oct 29, 2025
* Add limit of three pipelines per node * Small code simplification
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Closes #4470
Closes #5747
Closes #4630
This addresses both the limitation that only 1 merge pipelines can run per indexer at any given time and the fact that nodes systematically end up with all pipelines of a source when using Kafka, even if the number of pipelines for that source is rather large.
How was this PR tested?
Added unit test, should add more