This repository was archived by the owner on Jun 14, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 116
Add RefreshIncrementalAction to support index creation on newly appended data #163
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Sep 14, 2020
9 tasks
Contributor
|
@apoorvedave1 Can you look at the suggestions I made for #142 as far as the section |
sezruby
reviewed
Oct 5, 2020
src/main/scala/com/microsoft/hyperspace/actions/RefreshIncremental.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/RefreshIncremental.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/RefreshIncremental.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/E2EHyperspaceRulesTests.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/E2EHyperspaceRulesTests.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/IndexManagerTests.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/RefreshIncremental.scala
Show resolved
Hide resolved
sezruby
reviewed
Oct 6, 2020
src/test/scala/com/microsoft/hyperspace/index/E2EHyperspaceRulesTests.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/IndexManagerTests.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/IndexManagerTests.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/RefreshIncrementalAction.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/CreateActionBase.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/RefreshIncrementalAction.scala
Outdated
Show resolved
Hide resolved
imback82
reviewed
Oct 6, 2020
src/main/scala/com/microsoft/hyperspace/actions/RefreshIncrementalAction.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/RefreshIncrementalAction.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/RefreshIncrementalAction.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/RefreshIncrementalAction.scala
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/RefreshIncrementalAction.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/RefreshIncrementalAction.scala
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/RefreshIncrementalAction.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/RefreshIncrementalAction.scala
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/telemetry/HyperspaceEvent.scala
Outdated
Show resolved
Hide resolved
imback82
reviewed
Oct 6, 2020
src/test/scala/com/microsoft/hyperspace/index/E2EHyperspaceRulesTests.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/E2EHyperspaceRulesTests.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/E2EHyperspaceRulesTests.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/E2EHyperspaceRulesTests.scala
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/E2EHyperspaceRulesTests.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/E2EHyperspaceRulesTests.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/IndexManagerTests.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/IndexManagerTests.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/IndexManagerTests.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/IndexManagerTests.scala
Outdated
Show resolved
Hide resolved
apoorvedave1
commented
Oct 7, 2020
apoorvedave1
commented
Oct 7, 2020
imback82
approved these changes
Oct 7, 2020
Contributor
imback82
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @apoorvedave1!
sezruby
approved these changes
Oct 7, 2020
Collaborator
sezruby
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @apoorvedave1!
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
advanced issue
This is the tag for advanced issues which involve major design changes or introduction
enhancement
New feature or request
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implement incremental indexing support for append-only data.
NOTE TO REVIEWERS
Uber Issue: Please follow: #136 for complete details.
This PR depends on PR #142 (Delete support on source data) and #162 (Merge support for directory objects) for major changes. Until the other PR goes in, This will be kept as WIP. It can still be viewed for general idea of how the classes will evolve.
Most files in the dependency PRs can be ignored in this PR, if the reviewer is familiar with them. The new, review-able files with new functionality are as follows:
for Functionality:
RefreshIncremental.scala,for Tests:
IndexManagerTests.scalaWhat this PR does
This feature allows hyperspace to create indexes on newly arrived data. If the user appends new data to existing, pre-indexed data, they can use
refreshapi to generate indexes only on the additional data.This index creation will be faster than full refresh because it works only on additional data. This is different for a full refresh where the index is built from scratch on full data.
Algorithm Outline:
What changes were proposed in this pull request?
A new
RefreshIncremental.scalaaction class which is built based onRefreshActionBaseclass. Reviewers can start from this class to understand what data is being indexed, and how the new metadata is being generated to reflect the latest truth of the index.Why are the changes needed?
To support incremental indexing on just the unindexed data
Does this PR introduce any user-facing change?
Yes, this feature introduces a support for creating (or 'updating') index on newly added data, by creating index only on the new data.
How to enable this feature
This feature is currently hidden behind a flag "spark.hyperspace.index.refresh.append.enabled" which defaults to false. It will be later on supported as the api `refreshIndex(indexName, mode="quick") along with support for other features (e.g. delete) within that api. Please follow #136 for complete details.
How was this patch tested?