Add Broker config druid.broker.segment.watchRealtimeNodes#11732
Add Broker config druid.broker.segment.watchRealtimeNodes#11732abhishekagarwal87 merged 9 commits intoapache:masterfrom
druid.broker.segment.watchRealtimeNodes#11732Conversation
abhishekagarwal87
left a comment
There was a problem hiding this comment.
LGTM. Thank you @kfaraz
techdocsmith
left a comment
There was a problem hiding this comment.
Tried to simply and clarify a little bit.
| return true; | ||
| // Include realtime nodes only if they are watched | ||
| return segmentWatcherConfig.isWatchRealtimeNodes() | ||
| || metadataAndSegment.lhs.getType() != ServerType.REALTIME; |
There was a problem hiding this comment.
This should be looking for INDEXER_EXECUTOR, not sure REALTIME is actually used anymore (it was for old realtime nodes I think)
There was a problem hiding this comment.
yeah, looks like usages were removed in #7915 and just exists now for ... reasons.
There was a problem hiding this comment.
Thanks a lot for catching this! I can check for INDEXER_EXECUTOR here.
There was a problem hiding this comment.
Considering that we are looking for INDEXER_EXECUTOR here, the nomenclature watchRealtimeNodes is also a little misleading now.
There was a problem hiding this comment.
we should also add a warning in the code. does calling isSegmentReplicationTarget() makes more sense than checking the server type?
There was a problem hiding this comment.
does calling isSegmentReplicationTarget() makes more sense than checking the server type?
Hmm, that one is kinda weird too because of ServerType.BRIDGE which uh.. predates me and Druid i think, so I'm not really sure its story.
There was a problem hiding this comment.
maybe watchRealtimeTasks would be clearer?
There was a problem hiding this comment.
watchRealtimeTasks works too.
There was a problem hiding this comment.
Renamed config to watchRealtimeTasks. Fixed condition to check for ServerType.INDEXER_EXECUTOR instead of ServerType.REALTIME.
| public boolean isWatchRealtimeNodes() | ||
| { | ||
| return watchRealtimeNodes; | ||
| } |
There was a problem hiding this comment.
I think my only real question about this PR is if this should maybe be watchedServerTypes or something and take a list of server types to watch instead of just a boolean
There was a problem hiding this comment.
I was thinking the same thing but then I realized that only a few combinations are possible here:
(a) INDEXER_EXECUTOR and HISTORICAL, (b) only HISTORICAL (and maybe (c) only INDEXER_EXECUTOR).
As pointed out, REALTIME has been deprecated and not being used anywhere.
Could there be other queryable types in the future (or even now that I may have missed)?. If not, then I guess with the boolean, we just miss out on option (c) above.
Please let me know what you think.
There was a problem hiding this comment.
if you have broadcast load rules, and segment cache configured, brokers can load segments as well. They currently have independent handling in BrokerServerView (https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/client/BrokerServerView.java#L226), but i think could in theory also be filtered here.
If we did do a list, the default value should probably be like, all the server types.
There was a problem hiding this comment.
I'm not necessarily advocating for it to be a list.. just thinking out loud. A deny list might be better if we do go that route, or only filtering if the list isn't empty or something friendly.
There was a problem hiding this comment.
watchedServerTypes seems too broad and technical. Trying to understand the use case behind such a config. e.g. what would a watchedServerTypes=Broker would mean?
There was a problem hiding this comment.
so we would set something like druid.server.tier=_realtime on MM/indexers and then not include that in the watchedTiers? Is that what you are suggesting or something else?
There was a problem hiding this comment.
yes, that exactly is what i was wondering if would work
There was a problem hiding this comment.
it is worth trying. Hope it doesn't break something else 😄
There was a problem hiding this comment.
@clintropolis - thought about this more and while playing with tier names is a workable approach, having a separate config to exclude real-time nodes makes config management easier. Like if I am trying to build an isolated broker-historical cohort, I can set watchRealtimeNodes = false on that cohort. The rest of the cluster requires no update.
There was a problem hiding this comment.
set ignoreRealTime to true on that cohort
watchRealtimeNodes = false
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Description
The new config is an extension of the concept of "watchedTiers" where
the Broker can choose to add the info of only the specified tiers to its timeline.
Similarly, with this config, Broker can choose to skip the realtime nodes and
thus it would query only Historical processes for any given segment.
Changes
druid.broker.segment.watchRealtimeNodesBrokerServerViewbased on whether realtime nodes are watched or notThis PR has: