Make CompactionSearchPolicy injectable#13842
Conversation
A small refactoring that makes the search policy for compaction injectable. Future changes can introduce new search policies that can be configured and injected so that operators can choose which search policy is best suited for their cluster. This will also allow us to de-couple the scheduling of compaction jobs from the CompactSegments duty, allowing the co-ordinator to schedule compaction jobs faster than the duty lifecycle. This PR is made so that it easy to review the future changes.
|
It makes sense to have the policy be injectable since we might have more search policies down the line.
Could you please elaborate this part? I assume the intention here is not to make |
|
It appears that the policy is injected, which means the same policy for the entire cluster. Is there a need to allow the policy to be selected per task to better meed the needs per-datasource? We'd register the set of policies, and each task would name one of them, with some default. |
If CompactSegments runs faster than the segment metadata refresh interval (which I think is 1 min by default) it doesn't realize the segments that were selected for compaction were compacted already, and the task fails until the metadata is refreshed. Now that the policy is available in the coordinator, the CompactSegments duty can be split into 2 - one that refreshes the iterator which can take a long time and another that keeps polling for the next available interval to be compacted and schedules the compaction task if there is capacity on the cluster to do so. I'll try to write up something more detailed in the next PR. |
This is something I had thought about working towards. Today, there is a cluster wide configuration for max number of compaction task slots. So the policy is used as a tie breaker across datasources. In the future, it could make sense for a datasource to be able to say it prefers a search policy like "most fragmented intervals first" vs another datasource that cares about "newest interval first". But the cluster will need to decide how to choose which policy to apply when comparing intervals across these datasources. This sent my head spinning, so I figured I'd try to cross that bridge when I came to it. |
|
Overruling travis, the license failure was fixed in #13845 |
Crept in during apache#13842. Possibly logical conflict with another PR.
Crept in during #13842. Possibly logical conflict with another PR.
Description
A small refactoring that makes the search policy for compaction injectable.
Future changes can introduce new search policies that can be configured and injected so that operators can choose which search policy is best suited for their cluster.
This will also allow us to de-couple the scheduling of compaction jobs from the CompactSegments duty, allowing the co-ordinator to schedule compaction jobs faster than the duty lifecycle.
This PR is made so that it easy to review the future changes.
This PR has: