-
Notifications
You must be signed in to change notification settings - Fork 116
Merge append and delete actions on indexes on modified source data #187
Conversation
Co-authored-by: Rahul Potharaju <rapoth@microsoft.com>
RefreshIncremental.scala is for the same purpose.
# Conflicts: # src/main/scala/com/microsoft/hyperspace/index/IndexLogEntry.scala # src/test/scala/com/microsoft/hyperspace/index/IndexLogEntryTest.scala
# Conflicts: # src/main/scala/com/microsoft/hyperspace/actions/RefreshAction.scala # src/main/scala/com/microsoft/hyperspace/actions/RefreshActionBase.scala # src/main/scala/com/microsoft/hyperspace/actions/RefreshDeleteAction.scala # src/main/scala/com/microsoft/hyperspace/index/IndexCollectionManager.scala # src/main/scala/com/microsoft/hyperspace/index/IndexConstants.scala # src/test/scala/com/microsoft/hyperspace/index/E2EHyperspaceRulesTests.scala
…s.scala Co-authored-by: Rahul Potharaju <rapoth@microsoft.com>
…s.scala Co-authored-by: Rahul Potharaju <rapoth@microsoft.com>
…s.scala Co-authored-by: Rahul Potharaju <rapoth@microsoft.com>
src/test/scala/com/microsoft/hyperspace/index/RefreshIndexTests.scala
Outdated
Show resolved
Hide resolved
sezruby
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't forget to handle this in 0.4: #187 (comment)
src/test/scala/com/microsoft/hyperspace/index/E2EHyperspaceRulesTests.scala
Show resolved
Hide resolved
Thanks for the reminders. I prefer not to take this in this PR |
|
| } | ||
|
|
||
| test("Verify Join Index Rule utilizes indexes correctly after incremental refresh.") { | ||
| test("Verify JoinIndexRule utilizes indexes correctly after incremental refresh (append-only).") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: this is a preexsisting test with only a REFRESH_APPEND_ENABLED flag removed
src/main/scala/com/microsoft/hyperspace/index/IndexCollectionManager.scala
Outdated
Show resolved
Hide resolved
|
Can you or @thrajput also update the Python binding? @AFFogarty can help on the C# side once 0.3 is released. (I will ping you @AFFogarty 😄 when it's ready) |
src/main/scala/com/microsoft/hyperspace/actions/RefreshAction.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/RefreshAppendAction.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/index/CachingIndexCollectionManager.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/index/IndexCollectionManager.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/E2EHyperspaceRulesTests.scala
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/E2EHyperspaceRulesTests.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/IndexCollectionManagerTest.scala
Show resolved
Hide resolved
…anager.scala Co-authored-by: Terry Kim <yuminkim@gmail.com>
imback82
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @apoorvedave1!
I will wait others to approve as well before merging.
NOTES TO REVIEWERS:
Changes include:
Reviewers are NOT required to review PR Update index log entry for enforce delete during read time #170 to review this PR.
What is the context for this pull request?
Hyperspace currently handles append support and delete support on modifiable source data separately. Users can either allow deletion of some source files and refresh their index, or they can allow adding files to data and refresh the index.
This PR merges these operations which will enable users to have a fully modifyable data source. Users can delete some of the indexed source files and also append some new source files. A
refreshIndex(indexName, mode = "incremental")operation will allow users to update their indexes to accommodate for all these changes.What changes were proposed in this pull request?
1
Merge the behavior of
RefreshDeleteActionandRefreshAppendActionin a single user facing apirefreshIndex(indexName, mode = "incremental"). When the user calls this api, hyperspace should update the index by removing records from all the deleted source files, and adding records from the newly added source files.2 PR #188 MERGED
IndexLogEntry will support storing "appended" and "deleted" files (these changes were picked from #170 for correctness).
3 PR #191 MERGED
Update RefreshDeleteAction and RefreshAppendAction to also update the "appended" and "deleted" files list in index metadata.
4 PR #189 MERGED
If no updates to data source are found, it will NOT throw an exception anymore. This means:
5 PR #194 MERGED
Bug fix: if 'appended' or 'deleted' files are present in index metadata and hybrid scan is disabled, do NOT use those indexes. Otherwise this will lead to incorrect results.
Does this PR introduce any user-facing change?
Yes.
refreshIndex(indexName, mode = "incremental")for the users. Users can choose this overmode="full"for a faster refresh of index files.How was this patch tested?
added unit tests where we delete some files from original indexed data, followed by appending new data to original data. A
refresh(indexName, mode = "incremental")will update the indexes such that newly created index files will reflect the most recent version of the data.