Skip to content

Add Broker config druid.broker.segment.watchRealtimeNodes#11732

Merged
abhishekagarwal87 merged 9 commits intoapache:masterfrom
kfaraz:add_broker_config_watch_realtime
Nov 2, 2021
Merged

Add Broker config druid.broker.segment.watchRealtimeNodes#11732
abhishekagarwal87 merged 9 commits intoapache:masterfrom
kfaraz:add_broker_config_watch_realtime

Conversation

@kfaraz
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz commented Sep 22, 2021

Description

The new config is an extension of the concept of "watchedTiers" where
the Broker can choose to add the info of only the specified tiers to its timeline.
Similarly, with this config, Broker can choose to skip the realtime nodes and
thus it would query only Historical processes for any given segment.

Changes

  • Add new config druid.broker.segment.watchRealtimeNodes
  • Filter servers of type REALTIME in BrokerServerView based on whether realtime nodes are watched or not

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Copy Markdown
Contributor

@abhishekagarwal87 abhishekagarwal87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @kfaraz

Copy link
Copy Markdown
Contributor

@techdocsmith techdocsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to simply and clarify a little bit.

Comment thread docs/configuration/index.md Outdated
return true;
// Include realtime nodes only if they are watched
return segmentWatcherConfig.isWatchRealtimeNodes()
|| metadataAndSegment.lhs.getType() != ServerType.REALTIME;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be looking for INDEXER_EXECUTOR, not sure REALTIME is actually used anymore (it was for old realtime nodes I think)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, looks like usages were removed in #7915 and just exists now for ... reasons.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for catching this! I can check for INDEXER_EXECUTOR here.

Copy link
Copy Markdown
Contributor Author

@kfaraz kfaraz Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that we are looking for INDEXER_EXECUTOR here, the nomenclature watchRealtimeNodes is also a little misleading now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also add a warning in the code. does calling isSegmentReplicationTarget() makes more sense than checking the server type?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does calling isSegmentReplicationTarget() makes more sense than checking the server type?

Hmm, that one is kinda weird too because of ServerType.BRIDGE which uh.. predates me and Druid i think, so I'm not really sure its story.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe watchRealtimeTasks would be clearer?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

watchRealtimeTasks works too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed config to watchRealtimeTasks. Fixed condition to check for ServerType.INDEXER_EXECUTOR instead of ServerType.REALTIME.

Comment on lines +52 to +55
public boolean isWatchRealtimeNodes()
{
return watchRealtimeNodes;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my only real question about this PR is if this should maybe be watchedServerTypes or something and take a list of server types to watch instead of just a boolean

Copy link
Copy Markdown
Contributor Author

@kfaraz kfaraz Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same thing but then I realized that only a few combinations are possible here:
(a) INDEXER_EXECUTOR and HISTORICAL, (b) only HISTORICAL (and maybe (c) only INDEXER_EXECUTOR).

As pointed out, REALTIME has been deprecated and not being used anywhere.

Could there be other queryable types in the future (or even now that I may have missed)?. If not, then I guess with the boolean, we just miss out on option (c) above.

Please let me know what you think.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have broadcast load rules, and segment cache configured, brokers can load segments as well. They currently have independent handling in BrokerServerView (https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/client/BrokerServerView.java#L226), but i think could in theory also be filtered here.

If we did do a list, the default value should probably be like, all the server types.

Copy link
Copy Markdown
Member

@clintropolis clintropolis Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not necessarily advocating for it to be a list.. just thinking out loud. A deny list might be better if we do go that route, or only filtering if the list isn't empty or something friendly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

watchedServerTypes seems too broad and technical. Trying to understand the use case behind such a config. e.g. what would a watchedServerTypes=Broker would mean?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we would set something like druid.server.tier=_realtime on MM/indexers and then not include that in the watchedTiers? Is that what you are suggesting or something else?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that exactly is what i was wondering if would work

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is worth trying. Hope it doesn't break something else 😄

Copy link
Copy Markdown
Contributor

@abhishekagarwal87 abhishekagarwal87 Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clintropolis - thought about this more and while playing with tier names is a workable approach, having a separate config to exclude real-time nodes makes config management easier. Like if I am trying to build an isolated broker-historical cohort, I can set watchRealtimeNodes = false on that cohort. The rest of the cluster requires no update.

Copy link
Copy Markdown
Contributor Author

@kfaraz kfaraz Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set ignoreRealTime to true on that cohort

watchRealtimeNodes = false

kfaraz and others added 2 commits October 21, 2021 13:43
Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@abhishekagarwal87 abhishekagarwal87 merged commit a22687e into apache:master Nov 2, 2021
@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
@kfaraz kfaraz deleted the add_broker_config_watch_realtime branch September 30, 2022 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants