[Bug Fix] Broker will not wait for its SQL metadata view to fully initialize before starting up, even though set awaitInitializationOnStart true#10779
Conversation
| if (isServerViewInitialized) { | ||
| // lastFailure != 0L means exceptions happened before and there're some refresh work was not completed. | ||
| // so that even ServerView is initialized, we can't let broker complete initialization. | ||
| if (isServerViewInitialized && lastFailure == 0L) { |
There was a problem hiding this comment.
I think this is saying that we will mark as initialized when all of these are true:
(a) no refresh is needed
(b) the server view is initialized
(c) there wasn't a refresh that just failed
Sounds good to me - thanks for the fix!
Do you think you could add a test for this case too? Maybe something like DruidSchemaTest, but that creates a DruidSchema using a QueryLifecycleFactory where the first query fails fail, and then after that they start succeeding. We'd want to make sure that when awaitInitialization() returns, things are really initialized.
|
Hi @gianm Thanks for your review. I just add a UT named If condition is |
Summary: Pull upstream bug fix for broker not waiting for Druid schema to init Context: For large Druid clusters with a lot of data sources, we see SQL queries hit exceptions of 'unkonwn Druid table ABC' right after the broker host is restarted or new host is up, this is because there is a bug that broker hosts don't wait for Druid schema to init before declaring itself as ready. This PR has two changes: Pull an upstream bug fix apache#10779 [Bug Fix] Broker will not wait for its SQL metadata view to fully initialize before starting up, even though set awaitInitializationOnStart true This fix will make sure broker doesn't declare itself as ready before it init the schema for all the Druid tables: /status/health REST api will not prematurely return true and broker won't prematurely register itself in ZK. Add a config to control the number of segments per query when doing metadata refresh. Note that in the Druid schema init stage above, it does not guarantee to wait for metadata of ALL segments are refreshed. Rather, it waits for the first batch of segments' metadata for each data source is refreshed to make sure we have the latest schema ready then declares the Druid schema init as finished. The metadata refreshment of the remaining segments will be done asyncly afterwards. The metadata refreshment is also done from the newest segment to the oldest segment, so after the Druid schema init, we will have the schema for the latest segments, which is usually enough. The default batch size 15000 causes frequent time out from data nodes which delays Druid schema init and broker bootstrap. When a data source has 20000 segments, 15000 is not that different from 5000 if we only care about the latest schemas so it's OK to decrease. Test Plan: Tested on a canary host in Insights cluster, The broker waits for all Druid data source's latest schemas to be ready before declaring itself as ready. Changed the METADATA_REFRESH_MAX_SEGMENTS_PER_QUERY from 15000 to 5000, the Druid schema init decreased from 3.3 minutes to 18 seconds Reviewers: O1139 Druid, itallam, yyang Reviewed By: O1139 Druid, itallam, yyang Subscribers: shawncao, realtime-analytics Differential Revision: https://phabricator.pinadmin.com/D921875 Commit: 46de8df5fd30d2f15684da62278d616823d5dfcb Signed-off-by: ssagare <ssagare@pinterest.com>
druid.sql.planner.awaitInitializationOnStartis used for control whether the Broker will wait for its SQL metadata view to fully initialize before starting up.And PR #6765 was merge to solve the bug that
Start up DruidSchema immediately if there are no segmentsBut this PR brings another bug that Broker will not wait for its SQL metadata view to fully initialize before starting up, even though set druid.sql.planner.awaitInitializationOnStart true.
Description
From the design view, if exceptions threw in
druid/sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchema.java
Line 217 in 64f97e7
Broker will do retry in next loop.
Code here
druid/sql/src/main/java/org/apache/druid/sql/calcite/schema/DruidSchema.java
Line 245 in 64f97e7
break this design. In next loop,
isServerViewInitializedis true and exit the loop directly, but there are still several refresh work need to be done.Here is the full logs of this bug:
As you can see,
DruidSchema initialized in [76,537] ms.DruidSchema initialized is completed immediately, even though there are exceptions which need a retry. What's worse, Broker is started but queries served by this broker will fail.This PR add a new judgment condition to fix it.
This PR has:
Key changed/added classes in this PR
DruidSchema.java