fix issue with jetty graceful shutdown of data servers when druid.serverview.type=http#13499
Merged
clintropolis merged 4 commits intoapache:masterfrom Dec 6, 2022
Merged
Conversation
…ver shutdown with long polling
rohangarg
reviewed
Dec 5, 2022
Member
rohangarg
left a comment
There was a problem hiding this comment.
Nice find! 👍
Some minor comments
gianm
approved these changes
Dec 5, 2022
rohangarg
approved these changes
Dec 5, 2022
clintropolis
added a commit
to clintropolis/druid
that referenced
this pull request
Dec 7, 2022
…verview.type=http (apache#13499) * fix issue with http server inventory view blocking data node http server shutdown with long polling * adjust * fix test inspections
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #10600.
Description
This PR fixes an issue when
druid.serverview.type=httpis set, the mechanics of which use “long polling” to keep track of which servers have which segments, which interferes with the data node “graceful” shutdown of the Jetty server (the period is controlled bydruid.server.http.gracefulShutdownTimeout). The default timeout for the long polling requests is 4 minutes, which is much longer than the 30s default request timeout, so the requests never finish, the jetty server blocks for the entiregracefulShutdownTimeout, and leaves an ugly error aboutTimeoutExceptionfor the request that was not completed within the timeout.This happens consistently with both realtime streaming tasks and historical servers.
The solution is to make
ChangeRequestHistorywhich is the historical side mechanism that serves the segment changes to brokers and coordinators be tied into the ‘announcement’ phase of the lifecycle, so that the futures can be resolved and the executor shutdown before we begin shutting down jetty, which will ensure that these requests do not tie up the jetty threads for something that isn’t going to happen.This PR has: