-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Make SegmentLoader extensible and customizable #11398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
0af368e
92cbce4
1153abb
f62af2d
0e8d0bf
0408a52
2e6fd24
215c0f0
47e5c65
1482561
bbba704
c7ce26d
cc68ede
2873386
b1c874b
a549256
89cd7e5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,18 +30,58 @@ | |
| public interface SegmentCacheManager | ||
| { | ||
| /** | ||
| * Checks whether a segment is already cached. | ||
| * Checks whether a segment is already cached. It can return false even if {@link #reserve(DataSegment)} | ||
| * has been successful for a segment but is not downloaded yet. | ||
| */ | ||
| boolean isSegmentCached(DataSegment segment); | ||
|
|
||
| /** | ||
| * This method fetches the files for the given segment if the segment is not downloaded already. | ||
| * This method fetches the files for the given segment if the segment is not downloaded already. It | ||
| * is not required to {@link #reserve(DataSegment)} before calling this method. If caller has not reserved | ||
| * the space explicitly via {@link #reserve(DataSegment)}, the implementation should reserve space on caller's | ||
| * behalf. | ||
| * If the space has been explicitly reserved already | ||
| * - implementation should use only the reserved space to store segment files. | ||
| * - implementation should not release the location in case of download erros and leave it to the caller. | ||
| * @throws SegmentLoadingException if there is an error in downloading files | ||
| */ | ||
| File getSegmentFiles(DataSegment segment) throws SegmentLoadingException; | ||
|
|
||
| /** | ||
| * Cleanup the cache space used by the segment | ||
| * Tries to reserve the space for a segment on any location. When the space has been reserved, | ||
| * {@link #getSegmentFiles(DataSegment)} should download the segment on the reserved location or | ||
| * fail otherwise. | ||
| * | ||
| * This function is useful for custom extensions. Extensions can try to reserve the space first and | ||
| * if not successful, make some space by cleaning up other segments, etc. There is also improved | ||
| * concurrency for extensions with this function. Since reserve is a cheaper operation to invoke | ||
| * till the space has been reserved. Hence it can be put inside a lock if required by the extensions. getSegment | ||
| * can't be put inside a lock since it is a time-consuming operation, on account of downloading the files. | ||
| * | ||
| * @param segment - Segment to reserve | ||
| * @return True if enough space found to store the segment, false otherwise | ||
| */ | ||
| /* | ||
| * We only return a boolean result instead of a pointer to | ||
| * {@link StorageLocation} since we don't want callers to operate on {@code StorageLocation} directly outside {@code SegmentLoader}. | ||
| * {@link SegmentLoader} operates on the {@code StorageLocation} objects in a thread-safe manner. | ||
| */ | ||
| boolean reserve(DataSegment segment); | ||
|
|
||
| /** | ||
| * Reverts the effects of {@link #reserve(DataSegment)} (DataSegment)} by releasing the location reserved for this segment. | ||
| * Callers, that explicitly reserve the space via {@link #reserve(DataSegment)}, should use this method to release the space. | ||
| * | ||
| * Implementation can throw error if the space is being released but there is data present. Callers | ||
| * are supposed to ensure that any data is removed via {@link #cleanup(DataSegment)} | ||
| * @param segment - Segment to release the location for. | ||
| * @return - True if any location was reserved and released, false otherwise. | ||
| */ | ||
| boolean release(DataSegment segment); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you clarify the contract between this method and
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. if |
||
|
|
||
| /** | ||
| * Cleanup the cache space used by the segment. It will not release the space if the space has been | ||
| * explicitly reserved via {@link #reserve(DataSegment)} | ||
| */ | ||
| void cleanup(DataSegment segment); | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,24 +19,34 @@ | |
|
|
||
| package org.apache.druid.segment.loading; | ||
|
|
||
| import org.apache.druid.segment.Segment; | ||
| import org.apache.druid.guice.annotations.UnstableApi; | ||
| import org.apache.druid.segment.ReferenceCountingSegment; | ||
| import org.apache.druid.segment.SegmentLazyLoadFailCallback; | ||
| import org.apache.druid.timeline.DataSegment; | ||
|
|
||
| /** | ||
| * Loading segments from deep storage to local storage. Internally, this class can delegate the download to | ||
| * {@link SegmentCacheManager}. Implementations must be thread-safe. | ||
| */ | ||
| @UnstableApi | ||
| public interface SegmentLoader | ||
| { | ||
|
|
||
| /** | ||
| * Builds a {@link Segment} by downloading if necessary | ||
| * Returns a {@link ReferenceCountingSegment} that will be added by the {@link org.apache.druid.server.SegmentManager} | ||
| * to the {@link org.apache.druid.timeline.VersionedIntervalTimeline}. This method can be called multiple times | ||
| * by the {@link org.apache.druid.server.SegmentManager} and implementation can either return same {@link ReferenceCountingSegment} | ||
| * or a different {@link ReferenceCountingSegment}. Caller should not assume any particular behavior. | ||
| * | ||
| * Returning a {@code ReferenceCountingSegment} will let custom implementations keep track of reference count for | ||
| * segments that the custom implementations are creating. That way, custom implementations can know when the segment | ||
| * is in use or not. | ||
| * @param segment - Segment to load | ||
| * @param lazy - Whether column metadata de-serialization is to be deferred to access time. Setting this flag to true can speed up segment loading | ||
| * @param loadFailed - Callback to invoke if lazy loading fails during column access. | ||
| * @throws SegmentLoadingException - If there is an error in loading the segment | ||
| */ | ||
| Segment getSegment(DataSegment segment, boolean lazy, SegmentLazyLoadFailCallback loadFailed) throws SegmentLoadingException; | ||
| ReferenceCountingSegment getSegment(DataSegment segment, boolean lazy, SegmentLazyLoadFailCallback loadFailed) throws SegmentLoadingException; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the SegmentLoader guaranteed to return the same ReferenceCountingSegment instance across multiple calls of getSegment? Should it?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It can do either and the caller is not supposed to depend on that behavior. From the caller's perspective, it is going to get a segment object wrapped inside
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should be documented in the javadoc. |
||
|
|
||
| /** | ||
| * cleanup any state used by this segment | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should
isSegmentCachedstill return false afterreserveis called? Would be worth to document it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. will document it.