Make SegmentLoader extensible and customizable#11398
Make SegmentLoader extensible and customizable#11398abhishekagarwal87 merged 17 commits intoapache:masterfrom
Conversation
|
This pull request fixes 1 alert when merging 0af368e into 8037a54 - view on LGTM.com fixed alerts:
|
|
This pull request fixes 1 alert when merging 92cbce4 into 497f2a1 - view on LGTM.com fixed alerts:
|
|
This pull request introduces 2 alerts and fixes 2 when merging 0e8d0bf into 497f2a1 - view on LGTM.com new alerts:
fixed alerts:
|
|
This pull request introduces 2 alerts and fixes 2 when merging 0408a52 into 17efa6f - view on LGTM.com new alerts:
fixed alerts:
|
|
This pull request introduces 2 alerts and fixes 2 when merging 2e6fd24 into d5e8d4d - view on LGTM.com new alerts:
fixed alerts:
|
| public ReferenceCountingSegment getSegment(DataSegment segment, boolean lazy, SegmentLazyLoadFailCallback loadFailed) | ||
| throws SegmentLoadingException | ||
| { | ||
| final ReferenceCountingLock lock = createOrGetLock(segment); |
There was a problem hiding this comment.
the lock acquisition has been removed from here since getSegmentFiles already does it.
| * @return Segment object wrapped inside {@link ReferenceCountingSegment}. | ||
| * @throws SegmentLoadingException | ||
| */ | ||
| ReferenceCountingSegment getSegment(DataSegment segment, boolean lazy, SegmentLazyLoadFailCallback loadFailed) throws SegmentLoadingException; |
There was a problem hiding this comment.
Is the SegmentLoader guaranteed to return the same ReferenceCountingSegment instance across multiple calls of getSegment? Should it?
There was a problem hiding this comment.
It can do either and the caller is not supposed to depend on that behavior. From the caller's perspective, it is going to get a segment object wrapped inside ReferenceCountingSegment. Implementations can have optimizations to save on repeated expensive work.
There was a problem hiding this comment.
I think this should be documented in the javadoc.
| * @param segment - Segment to release the location for. | ||
| * @return - True if any location was reserved and released, false otherwise. | ||
| */ | ||
| boolean release(DataSegment segment); |
There was a problem hiding this comment.
the same code that calls reserve can call release. Idea is that if reserve is being called explicitly then, same should be done for release. In case of failures, SegmentLoader itself should not release the location and leave it to the caller instead.
jihoonson
left a comment
There was a problem hiding this comment.
Thanks @abhishekagarwal87, I left some comments mostly about the interface change.
This PR splits current SegmentLoader into SegmentLoader and SegmentCacheManager. SegmentLoader - this class is responsible for building the segment object but does not expose any methods for downloading, cache space management, etc. Default implementation delegates the download operations to SegmentCacheManager and only contains the logic for building segments once downloaded. . This class will be used in SegmentManager to construct Segment objects. SegmentCacheManager - this class manages the segment cache on the local disk. It fetches the segment files to the local disk, can clean up the cache, and in the future, support reserve and release on cache space. [See https://github.com/Make SegmentLoader extensible and customizable #11398]. This class will be used in ingestion tasks such as compaction, re-indexing where segment files need to be downloaded locally.
|
This pull request introduces 2 alerts and fixes 2 when merging dd80d1cd3ebd5a81220f5e2ddac722e7c9226223 into 0453e46 - view on LGTM.com new alerts:
fixed alerts:
|
dd80d1c to
b1c874b
Compare
|
This pull request introduces 2 alerts and fixes 2 when merging b1c874b into 6ce3b6c - view on LGTM.com new alerts:
fixed alerts:
|
| * @param segment - Segment to release the location for. | ||
| * @return - True if any location was reserved and released, false otherwise. | ||
| */ | ||
| boolean release(DataSegment segment); |
There was a problem hiding this comment.
Can you clarify the contract between this method and getSegmentFiles? For example, what should happen when release is called if reserve was not called but getSegmentFiles was called?
There was a problem hiding this comment.
Sure. if reserve is not called, getSegmentFiles will reserve the space. I will document this.
| * {@link StorageLocation} since we don't want callers to operate on {@code StorageLocation} directly outside {@code SegmentLoader}. | ||
| * {@link SegmentLoader} operates on the {@code StorageLocation} objects in a thread-safe manner. | ||
| */ | ||
| boolean reserve(DataSegment segment); |
There was a problem hiding this comment.
Should isSegmentCached still return false after reserve is called? Would be worth to document it.
There was a problem hiding this comment.
Yes. will document it.
| * @return Segment object wrapped inside {@link ReferenceCountingSegment}. | ||
| * @throws SegmentLoadingException | ||
| */ | ||
| ReferenceCountingSegment getSegment(DataSegment segment, boolean lazy, SegmentLazyLoadFailCallback loadFailed) throws SegmentLoadingException; |
There was a problem hiding this comment.
I think this should be documented in the javadoc.
| cleanupCacheFiles(loc.getPath(), storageDir); | ||
| } | ||
| boolean success = loadInLocationWithStartMarkerQuietly(loc, segment, storageDir, true); | ||
| if (success) { |
There was a problem hiding this comment.
Seems that loc.release should be called when success is false? Please add some test to verify this behavior.
There was a problem hiding this comment.
loadInLocationWithStartMarkerQuietly(loc, segment, storageDir, true);
this method will release the location since true is passed as the value of releaseLocation flag.
| public synchronized boolean isReserved(String segmentDir) | ||
| { | ||
| final File segmentFile = new File(path, segmentDir); | ||
| return files.contains(segmentFile); | ||
| } | ||
|
|
||
| public File segmentDirectoryAsFile(String segmentDir) | ||
| { | ||
| return new File(path, segmentDir); | ||
| } | ||
|
|
There was a problem hiding this comment.
The LGTM error in https://lgtm.com/projects/g/apache/druid/rev/pr-1ff2ba29372f1d2b44941bb55f75b5830f808401 seems like a false alarm. Perhaps we should suppress it for this change.
jihoonson
left a comment
There was a problem hiding this comment.
LGTM. Thanks @abhishekagarwal87.
|
This pull request introduces 2 alerts and fixes 2 when merging a549256 into 167c452 - view on LGTM.com new alerts:
fixed alerts:
|
|
This pull request introduces 1 alert and fixes 2 when merging 89cd7e5 into 167c452 - view on LGTM.com new alerts:
fixed alerts:
|
|
LGTM error has been suppressed but it will take effect after the PR gets merged. |
Description
This PR refactors the code related to segment loading specifically SegmentLoader and SegmentLoaderLocalCacheManager. SegmentLoader is marked
UnstableAPIwhich means, it can be extended outside core druid in custom extensions. Here is a summary of changesSegmentLoaderreturns an instance ofReferenceCountingSegmentinstead ofSegment. Earlier,SegmentManagerwas wrappingSegmentobjects insideReferenceCountingSegment. That is now moved toSegmentLoader. With this, a custom implementation can track the references of segments. It also allows them to create customReferenceCountingSegmentimplementations. For this reason, the constructor visibility inReferenceCountingSegmentis changed fromprivatetoprotected.SegmentCacheManagerhas two additional methods called -reserve(DataSegment)andrelease(DataSegment). These methods let the caller reserve or release space without callingSegmentLoader#getSegment. We already had similar methods inStorageLocationand now they are available inSegmentCacheManagertoo which wraps multiple locations.SegmentCacheManagerwherever possible. There is no change in the functionality.Key changed/added classes in this PR
SegmentLoaderSegmentLoaderLocalCacheManagerStorageLocationThis PR has: