Support for bootstrap segments#16609
Conversation
- Adds a new API in the coordinator.
- All processes that have storage locations configured (including tasks)
talk to the coordinator if they can, and fetch bootstrap segments from it.
- Then load the segments onto the segment cache as part of startup.
- This addresses the segment bootstrapping logic required by processes before
they can start serving queries or ingesting.
This patch also lays the foundation to speed up upgrades.
c9dcb02 to
a4ed4b1
Compare
9034d84 to
7459a8a
Compare
The rules aren't evaluated if there are no clusters.
|
Thanks for the feature, @abhishekrb19 ! It would be very useful for task/historical startup. |
…entLoadDropHandler.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
d746aee to
04793b6
Compare
|
Thanks for the review, @kfaraz! I've responded to the comments and added code comments/tests where applicable for further clarification. Could you take another look? |
kfaraz
left a comment
There was a problem hiding this comment.
Final nitpicks, rest looks good.
| bootstrapSegments = ImmutableList.copyOf(iterator); | ||
| try { | ||
| final BootstrapSegmentsInfo response = | ||
| FutureUtils.getUnchecked(coordinatorClient.fetchBootstrapSegments(), true); |
There was a problem hiding this comment.
Since this is going to block startup, should there be a timeout?
There was a problem hiding this comment.
Yeah, this uses the CoordinatorClient injected here, which has a standard retry policy with a max of 6 retries (~3 seconds total wait time) on transient errors. So processes will come up after ~3 seconds if the failures are persistent. There's one test testLoadBootstrapSegmentsWhenExceptionThrown() in this PR that verifies this fail open strategy that the server just comes up when an error occurs talking to the coordinator during startup.
I plan to add an optional fail close strategy for tasks in a follow up, which will likely need a different retry policy than the standard retry policy in this patch. I will try adding some more scenarios to verify the different behaviors like startup doesn't block indefinitely, etc.
There was a problem hiding this comment.
Sounds good. What about success scenarios, where a historical or task is busy loading a lot of segments and thus may not respond to liveness probes?
There was a problem hiding this comment.
There is currently no overall startup timeout. If this is an issue, particularly for tasks, we may want to adjust the bootstrapper such that it can at least respond to liveness probes.
| emitter.emit(new ServiceMetricEvent.Builder().setMetric("bootstrapSegments/fetch/time", fetchRunMillis)); | ||
| emitter.emit(new ServiceMetricEvent.Builder().setMetric("bootstrapSegments/fetch/count", bootstrapSegments.size())); | ||
| emitter.emit(new ServiceMetricEvent.Builder().setMetric("segment/bootstrap/time", fetchRunMillis)); | ||
| emitter.emit(new ServiceMetricEvent.Builder().setMetric("segment/bootstrap/count", bootstrapSegments.size())); |
There was a problem hiding this comment.
[Non blocker] It would be nice to add the dataSource dimension to this metric.
There was a problem hiding this comment.
Good idea, I will save this for a future patch
Problem
Currently, broadcast segments are assigned to data servers by the coordinator. This usual route alone can cause data correctness issues because if a process (either a task or server) is using broadcast segments for ingestion or to serve queries, the ingestion or query results can be invalid due to the asynchronous nature of segment assignment and loading.
Description
This PR builds on top of #16475.
To address this data correctness problem, a process when it starts up attempts to fetch and load any broadcast (bootstrap) segments from the coordinator first before it can proceed. If there are any errors in this bootstrapping process talking to the coordinator, the process just comes up in a "fail open" state and doesn't block start up.
Changes:
/v1/metadata/bootstrapSegmentsin the coordinator that returns bootstrap segments. Currently, the set of bootstrap segments only contains broadcast segments. In the future, we can expand on this mechanism to speed up historical upgrades as well.SegmentLoadDropHandler. The handler then loads any previously cached segments along with the bootstrap segments onto its storage location.Misc changes:
@JacksonInject PruneSpecsHolderargument in theLoadableDataSegmentconstructor. It always defaults toPruneSpecsHolder.DEFAULTand doesn't depend on the injected valueLocalDataSegmentPullerDruidExceptionFuture changes
Release note
Added a coordinator API
POST /v1/metadata/bootstrapSegmentsthat returns the set of bootstrap segments. Currently, the set only contains broadcast segments. When a process (either a server or task) comes up, it attempts to query the coordinator to fetch all bootstrap segments and loads them before the process can proceed doing its thing. This primarily addresses a data correctness issue where a process doesn't have broadcast segments assigned yet and starts processing data. In the event of any errors, the process just comes up in a fail open state and doesn't block start up.To this effect, new metrics have been added in the bootstrap segments flow:
segment/bootstrap/time: Total time taken to fetch bootstrap segments from the coordinatorsegment/bootstrap/count: Total count of bootstrap segments fetched from the coordinatorThis PR has: