Skip to content

fix issue with jetty graceful shutdown of data servers when druid.serverview.type=http#13499

Merged
clintropolis merged 4 commits intoapache:masterfrom
clintropolis:fix-http-shutdown
Dec 6, 2022
Merged

fix issue with jetty graceful shutdown of data servers when druid.serverview.type=http#13499
clintropolis merged 4 commits intoapache:masterfrom
clintropolis:fix-http-shutdown

Conversation

@clintropolis
Copy link
Copy Markdown
Member

Fixes #10600.

Description

This PR fixes an issue when druid.serverview.type=http is set, the mechanics of which use “long polling” to keep track of which servers have which segments, which interferes with the data node “graceful” shutdown of the Jetty server (the period is controlled by druid.server.http.gracefulShutdownTimeout). The default timeout for the long polling requests is 4 minutes, which is much longer than the 30s default request timeout, so the requests never finish, the jetty server blocks for the entire gracefulShutdownTimeout, and leaves an ugly error about TimeoutException for the request that was not completed within the timeout.

This happens consistently with both realtime streaming tasks and historical servers.

The solution is to make ChangeRequestHistory which is the historical side mechanism that serves the segment changes to brokers and coordinators be tied into the ‘announcement’ phase of the lifecycle, so that the futures can be resolved and the executor shutdown before we begin shutting down jetty, which will ensure that these requests do not tie up the jetty threads for something that isn’t going to happen.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Copy Markdown
Member

@rohangarg rohangarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! 👍
Some minor comments

@clintropolis clintropolis added this to the 25.0 milestone Dec 5, 2022
@clintropolis clintropolis merged commit cf47216 into apache:master Dec 6, 2022
@clintropolis clintropolis deleted the fix-http-shutdown branch December 6, 2022 23:52
clintropolis added a commit to clintropolis/druid that referenced this pull request Dec 7, 2022
…verview.type=http (apache#13499)

* fix issue with http server inventory view blocking data node http server shutdown with long polling

* adjust

* fix test inspections
kfaraz pushed a commit that referenced this pull request Dec 8, 2022
…verview.type=http (#13499) (#13515)

* fix issue with http server inventory view blocking data node http server shutdown with long polling

* adjust

* fix test inspections
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stream ingest Error when set coordiantor 'druid.serverview.type=http'

3 participants