Synchronization of lookups during startup of druid processes#4758
Synchronization of lookups during startup of druid processes#4758himanshug merged 35 commits intoapache:masterfrom
Conversation
| private final String snapshotWorkingDir; | ||
|
|
||
| @JsonProperty | ||
| private int numLookupLoadingThreads = 10; |
There was a problem hiding this comment.
can you update the documentation with this and also mention that this threadpool is only created at the time of start and destroyed , it is not kept during the lifetime of jvm.
| private int numLookupLoadingThreads = 10; | ||
|
|
||
| @JsonProperty | ||
| private boolean disableLookupSync = false; |
There was a problem hiding this comment.
disableLookupSyncOnStartup ?
There was a problem hiding this comment.
beyond the startup, they are always synced.
| LockSupport.parkNanos(LookupReferencesManager.this, TimeUnit.MINUTES.toNanos(1)); | ||
| } | ||
| catch (Throwable t) { | ||
| LOG.makeAlert(t, "Error occured while lookup notice handling.").emit(); |
There was a problem hiding this comment.
is this just indentation change? ... why did it change ?
There was a problem hiding this comment.
The indentation changed when I changed the Runnable right above it.
| String tier = lookupListeningAnnouncerConfig.getLookupTier(); | ||
| List<LookupBean> lookupBeanList = new ArrayList<>(); | ||
| // Check if the coordinator is accessible | ||
| if (getCoordinatorUrl().isEmpty()) { |
There was a problem hiding this comment.
can you put this check inside getLookupListFromCoordinator(tier) instead ?
| ListeningScheduledExecutorService executorService = MoreExecutors.listeningDecorator( | ||
| Executors.newScheduledThreadPool( | ||
| lookupConfig.getNumLookupLoadingThreads(), | ||
| Execs.makeThreadFactory("LookupReferencesManager--%s") |
There was a problem hiding this comment.
can we name it different from the other long running .e.g LookupReferencesManager-startup-%s ?
|
|
||
| LookupExtractorFactoryContainer container = lookupBean.getContainer(); | ||
| if (container.getLookupExtractorFactory().start()) { | ||
| LOG.info( |
There was a problem hiding this comment.
can we add this log before calling start() ... "Starting lookup..." ? basically if some start() implementation is stuck for some reason then logs would tell which lookup is stuck on start().
| stateRef.set(new LookupUpdateState(ImmutableMap.of(), ImmutableList.of(), ImmutableList.of())); | ||
| } | ||
| } else { | ||
| LOG.info("Lookup synchronization is disabled"); |
There was a problem hiding this comment.
can you change it to "Lookup loading on startup is disabled" .
| response.getStatus(), | ||
| response.getContent() | ||
| ); | ||
| if (lookupSnapshotTaker != null) { |
There was a problem hiding this comment.
this check does not belong here, this should just return null.
| response.getContent() | ||
| ); | ||
| if (lookupSnapshotTaker != null) { | ||
| LOG.info("Attempting to load saved snapshot"); |
| } else { | ||
| // Older version of getSpecificTier returns a list of lookup names. | ||
| // Lookup loading is performed via snapshot if older version is present. | ||
| if (response.getContent().startsWith("[")) { |
There was a problem hiding this comment.
can you update comment to say, this check should be removed in a future release and is there only for backward compatibility?
| stateRef.set(new LookupUpdateState(ImmutableMap.of(), ImmutableList.of(), ImmutableList.of())); | ||
| private List<LookupBean> getLookupListFromSnapshot() | ||
| { | ||
| return lookupSnapshotTaker.pullExistingSnapshot(); |
There was a problem hiding this comment.
we should do the null check for lookupSnapshotTaker here
| catch (Exception e) { | ||
| LOG.error("Error encountered while connecting to Coordinator"); | ||
| } | ||
| return ""; |
There was a problem hiding this comment.
and also this should probably be inside the catch block
| final Server instance = selector.pick(); | ||
| if (instance == null) { | ||
| LOG.info("Coordinator instance unavailable."); | ||
| return ""; |
| ).get(); | ||
| } | ||
|
|
||
| private LookupExtractorFactoryContainer getLookupEntryFromCode(String tier, String lookupCode) |
There was a problem hiding this comment.
this is not needed probably.
| @JsonSubTypes.Type(name = "timeFormat", value = TimeFormatExtractionFn.class), | ||
| @JsonSubTypes.Type(name = "identity", value = IdentityExtractionFn.class), | ||
| @JsonSubTypes.Type(name = "lookup", value = LookupExtractionFn.class), | ||
| @JsonSubTypes.Type(name = "registeredLookup", value = RegisteredLookupExtractionFn.class), |
There was a problem hiding this comment.
I had to move this to https://github.com/a2l007/druid/blob/8f2a360119be5d619a2bc0d57976f483faaad610/server/src/main/java/io/druid/query/lookup/LookupModule.java#L89
as part of the refactoring.
| @JsonSubTypes.Type(name = "extraction", value = ExtractionDimensionSpec.class), | ||
| @JsonSubTypes.Type(name = "regexFiltered", value = RegexFilteredDimensionSpec.class), | ||
| @JsonSubTypes.Type(name = "listFiltered", value = ListFilteredDimensionSpec.class), | ||
| @JsonSubTypes.Type(name = "lookup", value = LookupDimensionSpec.class) |
There was a problem hiding this comment.
I had to move this to https://github.com/a2l007/druid/blob/8f2a360119be5d619a2bc0d57976f483faaad610/server/src/main/java/io/druid/query/lookup/LookupModule.java#L89
as part of the refactoring.
|
@himanshug Suggested changes have been made. Please review. |
|
Requests to coordinator are now done via DruidLeaderClient. |
| |Property|Description|Default| | ||
| |--------|-----------|-------| | ||
| |`druid.lookup.snapshotWorkingDir`| Working path used to store snapshot of current lookup configuration, leaving this property null will disable snapshot/bootstrap utility|null| | ||
| |`druid.lookup.numLookupLoadingThreads`| Number of threads for loading the lookups in parallel.This threadpool is created only at the time of start and destroyed once startup is done. It is not kept during the lifetime of the JVM.|10| |
There was a problem hiding this comment.
nit: "..lookups in parallel on startup. .."
| private int numLookupLoadingThreads = 10; | ||
|
|
||
| @JsonProperty | ||
| private boolean disableLookupSyncOnStartup = false; |
There was a problem hiding this comment.
can you document this as well ?
|
|
||
| LookupExtractorFactoryContainer container = lookupBean.getContainer(); | ||
| LOG.info( | ||
| "Started lookup [%s]:[%s]", |
| container | ||
| ); | ||
| if (container.getLookupExtractorFactory().start()) { | ||
| return new AbstractMap.SimpleImmutableEntry<>(lookupBean.getName(), container); |
There was a problem hiding this comment.
add a log here with "Started lookup..."
| } | ||
| stateRef.set(new LookupUpdateState(builder.build(), ImmutableList.of(), ImmutableList.of())); | ||
| } | ||
| catch (Exception e) { |
There was a problem hiding this comment.
we should set thread to interrupted state if it was an interrupted exception
| LOG.error(e, "Failed to finish lookup load process."); | ||
| } | ||
| executorService.shutdownNow(); | ||
| } else { |
There was a problem hiding this comment.
can you also wait for executorService to finish using executorService.awaitTermination(1 minute) ... if it doesn't finish within that duration then log a warning.
| { | ||
| ImmutableMap.Builder<String, LookupExtractorFactoryContainer> builder = ImmutableMap.builder(); | ||
| ListeningScheduledExecutorService executorService = MoreExecutors.listeningDecorator( | ||
| Executors.newScheduledThreadPool( |
There was a problem hiding this comment.
Could use Execs.multiThreaded()
| private void startLookups(List<LookupBean> lookupBeanList) | ||
| { | ||
| ImmutableMap.Builder<String, LookupExtractorFactoryContainer> builder = ImmutableMap.builder(); | ||
| ListeningScheduledExecutorService executorService = MoreExecutors.listeningDecorator( |
There was a problem hiding this comment.
Could you please rewrite it using standard Java API, ExecutorCompletionService?
There was a problem hiding this comment.
I see that ScheduledExecutorService is not required here. I have changed it to ListeningExecutorService. Is this sufficient or do I still need to refactor it to use ExecutorCompletionService?
There was a problem hiding this comment.
I suggest to use ExecutorCompletionService because it's standard Java API, which replaces Guava functionality that you used, that means that this functionality has a good chance of being deprecated and removed from Guava in the future, adding compatibility problems.
Because of this dumb Guava's compatibility policy (remove things), we better use Guava as little as possible.
Squashed commit of the following: commit 24eb9fb Merge: eae35a5 07aa405 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Tue Oct 3 21:40:03 2017 -0500 Merge branch 'master' of https://github.com/druid-io/druid into synclookups commit eae35a5 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Tue Oct 3 21:39:59 2017 -0500 Update LRM commit 37ae0c9 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Tue Oct 3 11:46:38 2017 -0500 Make executorservice local commit fc9d53e Merge: bff5b09 c19cd23 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Tue Oct 3 11:45:31 2017 -0500 Merge branch 'master' of https://github.com/druid-io/druid into synclookups commit bff5b09 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Mon Oct 2 21:37:10 2017 -0500 Add tests to LookupConfig commit 1cb6627 Merge: 2ea72af 6f91d9c Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Mon Oct 2 19:45:07 2017 -0500 Merge branch 'master' of https://github.com/druid-io/druid into synclookups commit 2ea72af Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Mon Oct 2 08:57:58 2017 -0500 Revert Lookupconfig constructor changes commit c7ecfdc Merge: 9876ae0 1f2074c Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Thu Sep 28 17:14:08 2017 -0500 Merge branch 'master' of https://github.com/druid-io/druid into synclookups commit 9876ae0 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Thu Sep 28 17:14:02 2017 -0500 Remove lookup config constructor commit 99c6f60 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Thu Sep 28 11:50:53 2017 -0500 Refactoring and doc changes commit ce32e04 Merge: 361d255 2c30d5b Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Thu Sep 28 08:14:39 2017 -0500 Merge branch 'master' of https://github.com/druid-io/druid into synclookups commit 361d255 Merge: e5fe3c2 c3fbe51 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Thu Sep 28 08:14:35 2017 -0500 Doc changes commit e5fe3c2 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Wed Sep 20 21:36:16 2017 -0500 Update LookupConfig commit 6d9278f Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Wed Sep 20 21:33:31 2017 -0500 Move executorservice shutdown to finally block commit 263dfdc Merge: 727a706 a36adc6 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Wed Sep 20 20:54:04 2017 -0500 Merge branch 'master' of https://github.com/druid-io/druid into synclookups commit 727a706 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Fri Sep 15 08:59:12 2017 -0500 Rename flag commit 1db8914 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Fri Sep 15 08:37:33 2017 -0500 Update docs commit 756098b Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Thu Sep 14 22:45:57 2017 -0500 Make disablelookups flag true by default commit e9b3ddc Merge: f3ad262 d37be5e Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Thu Sep 14 21:43:28 2017 -0500 Merge branch 'master' of https://github.com/druid-io/druid into synclookups commit f3ad262 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Wed Sep 13 16:01:00 2017 -0500 Wait before thread shutdown commit 4a11a49 Merge: 6731438 4f6eb47 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Wed Sep 13 15:32:19 2017 -0500 Merge branch 'master' of https://github.com/druid-io/druid into synclookups commit 6731438 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Wed Sep 13 11:15:01 2017 -0500 Change coordinator instance to be retrieved by DruidLeaderClient commit a3bfc31 Merge: 6eaffd4 587f180 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Wed Sep 13 10:52:39 2017 -0500 Merge branch 'master' of https://github.com/druid-io/druid into synclookups commit 6eaffd4 Merge: 317e4e3 3a29521 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Tue Sep 12 16:13:32 2017 -0500 Merge branch 'master' of https://github.com/druid-io/druid into synclookups commit 317e4e3 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Wed Sep 6 20:56:53 2017 -0500 Minor refactors and doc update commit 8f2a360 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Wed Sep 6 15:58:35 2017 -0500 Refactor of Lookup classes commit 5d41b29 Author: Atul Mohan <atulmohan@yahoo-inc.com> Date: Wed Sep 6 15:12:08 2017 -0500 Changes for lookup synchronization
| "LookupReferencesManager-Startup-%s" | ||
| ); | ||
| ExecutorCompletionService completionService = new ExecutorCompletionService(executorService); | ||
| try { |
There was a problem hiding this comment.
@leventov ExecutorCompletionService is being used in the latest commit. Please review.
| builder.put(lookupResult); | ||
| } | ||
| } | ||
| catch (InterruptedException | ExecutionException e) { |
There was a problem hiding this comment.
On InterruptedException, it should fall through to the catch (Exception e) { block below. And cancel all the remaining futures.
There was a problem hiding this comment.
I have cancelled the remaining futures on ExecutionException.
| } | ||
| ); | ||
| } | ||
| IntStream.range(0, lookupBeanList.size()) |
There was a problem hiding this comment.
Please just use loop, also it will make easier to fall thorough with InterruptedException below.
| catch (Exception e) { | ||
| LOG.error(e, "Failed to finish lookup load process."); | ||
| for (Future future : futures) { | ||
| if (!future.isDone()) { |
There was a problem hiding this comment.
isDone not needed -- see example in ExecutorCompletionService's javadoc.
| } | ||
| private List<LookupBean> getLookupsListFromLookupConfig() | ||
| { | ||
| List<LookupBean> lookupBeanList = new ArrayList<>(); |
| // Lookup loading is performed via snapshot if older version is present. | ||
| // This check is only for backward compatibility and should be removed in a future release | ||
| if (response.getContent().startsWith("[")) { | ||
| return null; |
There was a problem hiding this comment.
Please add logging message here, this is a very confusing situation in practice, is just says "coordinator not available"
|
@a2l007 could you please finish this? |
|
@leventov I have updated the logging as requested. Do you have any pending feedback? |
|
@himanshug @drcrallen more comments? |
|
LGTM |
…mit was one in the series of attempts to fix the problem with QTL startup, that was ultimately fixed by apache#4758. So this commit is probably not needed anymore, but I’m not 100% sure. In our Druid 0.11.0 deployments, this commit was definitely harmless, so not risking and moving this commit to `0.12.1-mmx` for the sake of stability.
This introduces lookup synchronization during startup of Druid processes. The main motivation for this is to cut the lookup load time when performed by the Coordinator, so that realtime tasks have lookups available immediately.
Upon startup, processes communicate with the coordinator on retrieving the lookups to be loaded and loads them before finishing up the start sequence. Thus all the processes have the lookups loaded before starting to serve queries. If it fails to communicate with the coordinator (Coordinator unavailable) then it falls back to snapshot directory if there is any.
This does not fail startup of the process in case of failing to load lookups. Following are the high level changes: