-
Notifications
You must be signed in to change notification settings - Fork 116
Disable indexes with non-empty "appended" or "deleted" files if hybrid-scan is disabled #194
Conversation
…into errorhandling
…into bugfixes # Conflicts: # src/test/scala/com/microsoft/hyperspace/TestUtils.scala
Co-authored-by: Terry Kim <yuminkim@gmail.com>
Co-authored-by: Terry Kim <yuminkim@gmail.com>
…into errorhandling
# Conflicts: # src/main/scala/com/microsoft/hyperspace/actions/RefreshDeleteAction.scala # src/test/scala/com/microsoft/hyperspace/index/RefreshIndexTests.scala
| indexes.filter( | ||
| index => | ||
| index.created && signatureValid(index) && | ||
| index.deletedFiles.isEmpty && index.appendedFiles.isEmpty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be ok but might cause some bad user experience:
- refreshIncremental
- failed to apply index
- no idea or clue for the failure until the user checks index log entry ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sezruby for bringing this up, i think that's ok. This behavior is expected. I think this experience is exactly same as the following:
- user creates index
- user updates source data
- fail to apply index
- no idea or clue for the failure until the user checks index log entry ..
Please feel free to suggest how to improve this experience. We can create an issue for the same and fix it in subsequent PRs.
On the other hand, if we don't do this, it will lead to incorrect results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user experience currently is not good either. For example, with hybrid scan off, if you add a file, it will fail in signatureValid without letting the user know. We have been talking about exposing "why not" API that tells the user why an index was not picked up. Until we introduce that kind of API, I think the user experience will remain "not desirable" - meaning not good. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @apoorvedave1 beat me on this comment 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imback82 @apoorvedave1
This is why I suggested quick failure (assert) rather than storing deletedFiles but maybe we could hybrid scan with it later.
Now we are proposing "mutable" dataset, I think we need to address this issue with more care.
Anyway I'm fine with the current approach as it's also valid :)
Thanks all!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, in this case, I still don't understand 😄. @rapoth do you understand this scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok let me try again 😁
- disable "global" hybrid scan config
- user can choose some "beneficial - with high perf improvement" indexes with little diff and do
quickrefresh for the indexes - hybrid scan can be performed for
quickrefreshed indexes even if the global config is disabled. The other unrefreshed indexes with diff won't be the candidates for hybrid scan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 👍 . So in a nutshell, the selective hybrid scan is "we can apply the hybrid scan even if the user didn't enable it only when the signature matches", right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. That's the point 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
src/test/scala/com/microsoft/hyperspace/index/rules/RuleUtilsTest.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/rules/RuleUtilsTest.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/rules/RuleUtilsTest.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/rules/RuleUtilsTest.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/rules/RuleUtilsTest.scala
Outdated
Show resolved
Hide resolved
| indexes.filter( | ||
| index => | ||
| index.created && signatureValid(index) && | ||
| index.deletedFiles.isEmpty && index.appendedFiles.isEmpty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user experience currently is not good either. For example, with hybrid scan off, if you add a file, it will fail in signatureValid without letting the user know. We have been talking about exposing "why not" API that tells the user why an index was not picked up. Until we introduce that kind of API, I think the user experience will remain "not desirable" - meaning not good. 😄
imback82
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor comments, but LGTM, thanks @apoorvedave1!
src/test/scala/com/microsoft/hyperspace/index/rules/HyperspaceRuleTestSuite.scala
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/rules/RuleUtilsTest.scala
Outdated
Show resolved
Hide resolved
|
@apoorvedave1 Can you help creating an issue to address improving the user experience when indexes are not selected (e.g., the "why not" API)? I think we should address this sooner than later (a good candidate for the next release - 0.5). Thanks! |
@apoorvedave1 Can you take care of creating this issue with the relevant details? |
What is the context for this pull request?
Fixes #193
What changes were proposed in this pull request?
Remove any index candidate for being picked when:
deletedFilesorappendedFilesin index metadata is non-emptyDoes this PR introduce any user-facing change?
no. It's a bug fix.
How was this patch tested?
added unit tests