Historical load Segments enhancement#10650
Merged
himanshug merged 10 commits intoapache:masterfrom Dec 14, 2020
Merged
Conversation
9 tasks
himanshug
reviewed
Dec 11, 2020
|
|
||
| if (loc != null) { | ||
| File localStorageDir = new File(loc.getPath(), storageDir); | ||
| if (checkSegmentFilesIntact(localStorageDir)) { |
Contributor
There was a problem hiding this comment.
nit: maybe we can push this check inside findStorageLocationIfLoaded(segment) itself next to its dirExists? check.
Contributor
Author
There was a problem hiding this comment.
Done. Thanks for your review!
himanshug
approved these changes
Dec 11, 2020
Contributor
Author
|
@himanshug Thanks for your review and merge! |
harinirajendran
pushed a commit
to harinirajendran/druid
that referenced
this pull request
Dec 15, 2020
* load segments with segment files check * add more java docs * done * add java docs * revert misc * resolve ci failures * resolve ci failures * done Co-authored-by: yuezhang <yuezhang@freewheel.tv>
9 tasks
Contributor
|
@zhangyue19921010 thank you for the PR! I added some missing unit tests in this PR in #10737. |
JulianJaffePinterest
pushed a commit
to JulianJaffePinterest/druid
that referenced
this pull request
Jan 22, 2021
* load segments with segment files check * add more java docs * done * add java docs * revert misc * resolve ci failures * resolve ci failures * done Co-authored-by: yuezhang <yuezhang@freewheel.tv>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #10649.
There are two minor shortcomings in Druid historical loading, which can be enhancement to improve the robustness.
Fitst :
When Historical start up or do compact action, Druid will check segments are loaded or not.
The existing logic is based on whether the directory exists. When directory exists but segment files are damaged during download and unzip from DeepStorage, like crashed, this simple check will pass. What's worse is that any action using this segments like segment loading or datasource compaction will fail unexpectedly.
Second :
When set
druid.coordinator.loadqueuepeon.type=httpusing "http" implementation to assign segment loads/drops to historical, there is a LRU cache design to maintain idempotent if same request shows up again and to return status of a completed requestdruid/server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
Line 95 in b91a169
And only new requests can be executed
druid/server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
Line 510 in b91a169
If last action loads damaged segment files mentioned above , this action is failed and cached.
Next time coordinator asks historical to load this segment again, Historical Server will response failure based on cache rather than try to load again which will success.
This cycle may cause coordinator letting current historical node loading this segments over and over again. And Historical will keep response failure without a retry until LRU cache is Invalidation or Stream Index Task is failed because of completionTimeout limitation.
Description
downloadStartMarkerThis PR has:
Key changed/added classes in this PR
SegmentLoaderLocalCacheManager.javaSegmentLoadDropHandler.java