Skip to content

Historical load Segments enhancement#10650

Merged
himanshug merged 10 commits intoapache:masterfrom
zhangyue19921010:historical-load-enhancment
Dec 14, 2020
Merged

Historical load Segments enhancement#10650
himanshug merged 10 commits intoapache:masterfrom
zhangyue19921010:historical-load-enhancment

Conversation

@zhangyue19921010
Copy link
Copy Markdown
Contributor

@zhangyue19921010 zhangyue19921010 commented Dec 7, 2020

Fixes #10649.
There are two minor shortcomings in Druid historical loading, which can be enhancement to improve the robustness.

Fitst :
When Historical start up or do compact action, Druid will check segments are loaded or not.
The existing logic is based on whether the directory exists. When directory exists but segment files are damaged during download and unzip from DeepStorage, like crashed, this simple check will pass. What's worse is that any action using this segments like segment loading or datasource compaction will fail unexpectedly.

Second :
When set druid.coordinator.loadqueuepeon.type=http using "http" implementation to assign segment loads/drops to historical, there is a LRU cache design to maintain idempotent if same request shows up again and to return status of a completed request

private final Cache<DataSegmentChangeRequest, AtomicReference<Status>> requestStatuses;

And only new requests can be executed

if (requestStatuses.getIfPresent(changeRequest) == null) {

If last action loads damaged segment files mentioned above , this action is failed and cached.

Next time coordinator asks historical to load this segment again, Historical Server will response failure based on cache rather than try to load again which will success.

This cycle may cause coordinator letting current historical node loading this segments over and over again. And Historical will keep response failure without a retry until LRU cache is Invalidation or Stream Index Task is failed because of completionTimeout limitation.

Description

  1. Do segment files integrity check before loading based on downloadStartMarker
  2. Try to load segments again after failure.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Key changed/added classes in this PR
  • SegmentLoaderLocalCacheManager.java
  • SegmentLoadDropHandler.java


if (loc != null) {
File localStorageDir = new File(loc.getPath(), storageDir);
if (checkSegmentFilesIntact(localStorageDir)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we can push this check inside findStorageLocationIfLoaded(segment) itself next to its dirExists? check.

Copy link
Copy Markdown
Contributor Author

@zhangyue19921010 zhangyue19921010 Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for your review!

@himanshug himanshug merged commit 0ad27c0 into apache:master Dec 14, 2020
@zhangyue19921010
Copy link
Copy Markdown
Contributor Author

@himanshug Thanks for your review and merge!

harinirajendran pushed a commit to harinirajendran/druid that referenced this pull request Dec 15, 2020
* load segments with segment files check

* add more java docs

* done

* add java docs

* revert misc

* resolve ci failures

* resolve ci failures

* done

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
@jihoonson jihoonson added this to the 0.21.0 milestone Jan 4, 2021
@jihoonson
Copy link
Copy Markdown
Contributor

jihoonson commented Jan 8, 2021

@zhangyue19921010 thank you for the PR! I added some missing unit tests in this PR in #10737.

JulianJaffePinterest pushed a commit to JulianJaffePinterest/druid that referenced this pull request Jan 22, 2021
* load segments with segment files check

* add more java docs

* done

* add java docs

* revert misc

* resolve ci failures

* resolve ci failures

* done

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
@zhangyue19921010 zhangyue19921010 deleted the historical-load-enhancment branch February 9, 2021 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Druid Historical may never load specific segment again which is failed loading before.

3 participants