Enable querying entirely cold datasources #16676
Conversation
kfaraz
left a comment
There was a problem hiding this comment.
This PR has some refactors which should be tackled separately, in order to facilitate a smoother review of the core changes here.
| return datasourceToUnavailableSegments; | ||
| } | ||
|
|
||
| public Object2IntMap<String> getDatasourceToDeepStorageQueryOnlySegmentCount() |
There was a problem hiding this comment.
@findingrish , this seems like the only new method that has been added here.
Please remove this new class SegmentReplicationStatusManager and move the code back toDruidCoordinator.
If a refactor is required, please do it in a separate PR.
This PR should focus only on the required changes.
There was a problem hiding this comment.
Correction: It seems that this method had already existed too.
@findingrish , is there any new code in SegmentReplicationStatusManager?
There was a problem hiding this comment.
@kfaraz There is no new code in SegmentReplicationStatusManager. The reason for refactoring was a cyclic dependency between CoordinatorSegmentMetadataCache and DruidCoordinator while trying to use DruidCoordinator#getSegmentReplicationFactor.
I will raise a separate PR for the refactor.
There was a problem hiding this comment.
Okay. Can you share some more details on how the cyclic dependency is coming into picture?
There was a problem hiding this comment.
Currently DruidCoordinator has a dependency on CoordinatorSegmentMetadataCache, for this patch I need to use DruidCoordinator#getSegmentReplicationFactor in CoordinatorSegmentMetadataCache which is resulting in cyclic dependency.
As a solution, I have refactored DruidCoordinator to separate out the code which updates segmentReplicationStatus and broadcastSegments.
Let me know if this solution makes sense.
There was a problem hiding this comment.
@findingrish , you could just expose a method updateSegmentReplicationStatus() on CoordinatorSegmentMetadataCache. Call this method from DruidCoordinator.UpdateReplicationStatus.run() where we update broadcastSegments and segmentReplicationStatus.
Let me know if this works for you.
There was a problem hiding this comment.
Yeah, this approach would work for me.
However, it seems bit odd that DruidCoordinator.UpdateReplicationStatus has to additionally update state in some other class, ideally the consumer CoordinatorSegmentMetadataCache should be pulling this information?
Is there a reason to avoid the refactor work?
There was a problem hiding this comment.
Is there a reason to avoid the refactor work?
Yes, the dependencies are already all over the place which makes the code less readable and also complicates testing. A refactor is needed here but it would have to be thought through a little.
However, it seems bit odd that DruidCoordinator.UpdateReplicationStatus has to additionally update state in some other class,
Not really, you can think of the DruidCoordinator (or rather the UpdateReplicationStatus duty in this case) as sending a notification to the CoordinatorSegmentMetadatCache saying that the segment replication status has been updated. The DruidCoordinator already sends notification to the metadata cache about leadership status, this is another notification in the same vein.
There was a problem hiding this comment.
Yes, the dependencies are already all over the place which makes the code less readable and also complicates testing. A refactor is needed here but it would have to be thought through a little.
Yes, this makes sense. DruidCoordinator refactoring would need more thought.
Thanks for the suggestion, I will update the patch.
| /** | ||
| * Retrieves list of used datasources. | ||
| */ | ||
| ListenableFuture<Set<String>> fetchUsedDataSources(); |
There was a problem hiding this comment.
Please add the definition of used data sources here.
There was a problem hiding this comment.
I have updated the method name to fetchDatasourcesWithUsedSegments to make it more understandable.
I don't think we need to document about what used segments means, since it is widely referred in the code and docs. For example, here https://druid.apache.org/docs/latest/api-reference/data-management-api/#mark-a-single-segment-as-used.
| * It contains schema for datasources with atleast 1 available segment. | ||
| */ | ||
| protected final ConcurrentMap<String, T> tables = new ConcurrentHashMap<>(); | ||
| protected final ConcurrentHashMap<String, T> tables = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Nit: Just wondering what specific hashMapMethods are you using which required this change.
There was a problem hiding this comment.
I started using computeIfAbsent method. The explanation is captured here https://github.com/code-review-checklists/java-concurrency/blob/master/README.md#chm-type.
| coldScehmaExec = Executors.newSingleThreadScheduledExecutor( | ||
| new ThreadFactoryBuilder() | ||
| .setNameFormat("DruidColdSchema-ScheduledExecutor-%d") | ||
| .setDaemon(true) |
There was a problem hiding this comment.
Why is this a demon thread ?
There was a problem hiding this comment.
I will update, we don't need a daemon thread here.
| cacheExecFuture = cacheExec.submit(this::cacheExecLoop); | ||
| coldSchemaExecFuture = coldScehmaExec.schedule( | ||
| this::coldDatasourceSchemaExec, | ||
| coldSchemaExecPeriodMillis, |
There was a problem hiding this comment.
Is there a specific reason to undocumented these properties.
Do we have any metrics which tell us the performance of these executor service in terms of number of cold segments back filed ?
There was a problem hiding this comment.
Do we have any metrics which tell us the performance of these executor service in terms of number of cold segments back filed
We are not backfilling segment here. It is just looping over the segments, identifying cold segment and building their schema.
If the datasource schema is updated it is logged.
There was a problem hiding this comment.
Since this exec iterates over all the segments, what things do we have to figure out how much time it took for execution?
Should we publish some summary stats to increase operability.
There was a problem hiding this comment.
Makes sense. I am logging details if the execution duration is greater than 50 seconds.
| coldSchemaTable.keySet().retainAll(dataSources); | ||
| } | ||
|
|
||
| private RowSignature mergeHotAndColdSchema(RowSignature hot, RowSignature cold) |
There was a problem hiding this comment.
I am very surprised you need a new method here. There should be existing logic which does this no ?
There was a problem hiding this comment.
I can refactor this a bit to have a single method for merging the RowSignature.
| } | ||
|
|
||
| // remove any stale datasource from the map | ||
| coldSchemaTable.keySet().retainAll(dataSources); |
There was a problem hiding this comment.
Do we have a test case for this ?
There was a problem hiding this comment.
Yes, in CoordinatorSegmentMetadataCacheTest#testColdDatasourceSchema_verifyStaleDatasourceRemoved.
| ); | ||
|
|
||
| SegmentReplicaCount segmentReplicaCount = new SegmentReplicaCount(); | ||
| segmentReplicaCount.setRequired(0, 0); |
There was a problem hiding this comment.
Wouldn't the values already be zero in a fresh instance?
There was a problem hiding this comment.
Right, I was trying to be explicit about setting it to 0, so that it is clear in the test that the given segment is unavailable.
There was a problem hiding this comment.
You can just add a comment to that effect and/or use test method names that clarify that point.
Invoking this method requires making it public which doesn't really seem necessary since it is only going to be used in this test.
There was a problem hiding this comment.
Makes sense. Updated.
| /** | ||
| * Map of datasource and generic object extending DataSourceInformation. | ||
| * This structure can be accessed by {@link #cacheExec} and {@link #callbackExec} threads. | ||
| * It contains schema for datasources with atleast 1 available segment. |
There was a problem hiding this comment.
| * It contains schema for datasources with atleast 1 available segment. | |
| * It contains schema for datasources with at least 1 available segment. |
| coldSchemaTable.keySet().retainAll(dataSources); | ||
|
|
||
| if (stopwatch.millisElapsed() > COLD_SCHEMA_SLOWNESS_THRESHOLD_MILLIS) { | ||
| log.info("Cold schema processing was slow, taking [%d] millis. " |
There was a problem hiding this comment.
Else for this should be debug.
| coldSchemaTable.keySet().retainAll(dataSourceWithColdSegmentSet); | ||
|
|
||
| String executionStatsLog = StringUtils.format( | ||
| "Cold schema processing was slow, taking [%d] millis. " |
There was a problem hiding this comment.
| "Cold schema processing was slow, taking [%d] millis. " | |
| "Cold schema processing took [%d] millis. " |
|
|
||
| String executionStatsLog = StringUtils.format( | ||
| "Cold schema processing was slow, taking [%d] millis. " | ||
| + "Processed [%d] datasources, [%d] segments & [%d] datasourceWithColdSegments.", |
There was a problem hiding this comment.
| + "Processed [%d] datasources, [%d] segments & [%d] datasourceWithColdSegments.", | |
| + "Processed total [%d] datasources, [%d] segments. Found [%d] datasources with cold segments.", |
Add ability to query entirely cold datasources.
Issue: #14989
Problem
Currently, datasource schema doesn’t include columns from cold segments. This makes it impossible to query entirely cold datasource.
Approach
Backfill schema for cold segments
Leverage the existing schema backfill flow added as part of
CentralizedDatasourceSchemafeature. Users are supposed to manually load the cold segments by making their replication factor as 1 and once the schema is backfilled (can be verified from the metadata database) they can unload the segment.Handling entirely cold datasource
The problem with cold datasource is that Broker just doesn’t know about the datasource if none of the segment are available. So, the datasource wouldn’t even appear on the console for querying.
We need a way for the Brokers to be aware of cold datasource, so that it can fetch its schema from the Coordinator.
Currently, brokers request schema for available datasources from Coordinator in each refresh cycle.
Brokers now poll set of used datasources from the Coordinator first and then request their schema from the Coordinator.
Once Broker has schema for Cold datasources, it will show up in the console and become available for querying.
Key changes
CoordinatorSegmentMetadataCacheBrokerSegmentMetadataCacheRelease Notes
CentralizedDatasourceSchemafeature needs to be enabled in order to query entirely cold datasources, also it would enable querying on columns only present in cold segments.This PR has: