Skip to content
This repository was archived by the owner on Jun 14, 2024. It is now read-only.

Conversation

@apoorvedave1
Copy link
Contributor

@apoorvedave1 apoorvedave1 commented Oct 9, 2020

What is the context for this pull request?

Fixes #193

What changes were proposed in this pull request?

Remove any index candidate for being picked when:

  • hybrid scan is disabled AND
  • either deletedFiles or appendedFiles in index metadata is non-empty

Does this PR introduce any user-facing change?

no. It's a bug fix.

How was this patch tested?

added unit tests

apoorvedave1 and others added 30 commits October 7, 2020 20:39
…into bugfixes

# Conflicts:
#	src/test/scala/com/microsoft/hyperspace/TestUtils.scala
Co-authored-by: Terry Kim <yuminkim@gmail.com>
Co-authored-by: Terry Kim <yuminkim@gmail.com>
# Conflicts:
#	src/main/scala/com/microsoft/hyperspace/actions/RefreshDeleteAction.scala
#	src/test/scala/com/microsoft/hyperspace/index/RefreshIndexTests.scala
@apoorvedave1 apoorvedave1 self-assigned this Oct 10, 2020
indexes.filter(
index =>
index.created && signatureValid(index) &&
index.deletedFiles.isEmpty && index.appendedFiles.isEmpty)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be ok but might cause some bad user experience:

  1. refreshIncremental
  2. failed to apply index
  3. no idea or clue for the failure until the user checks index log entry ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sezruby for bringing this up, i think that's ok. This behavior is expected. I think this experience is exactly same as the following:

  1. user creates index
  2. user updates source data
  3. fail to apply index
  4. no idea or clue for the failure until the user checks index log entry ..

Please feel free to suggest how to improve this experience. We can create an issue for the same and fix it in subsequent PRs.

On the other hand, if we don't do this, it will lead to incorrect results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user experience currently is not good either. For example, with hybrid scan off, if you add a file, it will fail in signatureValid without letting the user know. We have been talking about exposing "why not" API that tells the user why an index was not picked up. Until we introduce that kind of API, I think the user experience will remain "not desirable" - meaning not good. 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @apoorvedave1 beat me on this comment 😄

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imback82 @apoorvedave1
This is why I suggested quick failure (assert) rather than storing deletedFiles but maybe we could hybrid scan with it later.

Now we are proposing "mutable" dataset, I think we need to address this issue with more care.

Anyway I'm fine with the current approach as it's also valid :)

Thanks all!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, in this case, I still don't understand 😄. @rapoth do you understand this scenario?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let me try again 😁

  1. disable "global" hybrid scan config
  2. user can choose some "beneficial - with high perf improvement" indexes with little diff and do quick refresh for the indexes
  3. hybrid scan can be performed for quick refreshed indexes even if the global config is disabled. The other unrefreshed indexes with diff won't be the candidates for hybrid scan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍 . So in a nutshell, the selective hybrid scan is "we can apply the hybrid scan even if the user didn't enable it only when the signature matches", right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. That's the point 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@apoorvedave1 apoorvedave1 marked this pull request as ready for review October 10, 2020 04:24
@apoorvedave1 apoorvedave1 requested a review from sezruby October 10, 2020 04:50
@apoorvedave1 apoorvedave1 added this to the 0.4.0 milestone Oct 10, 2020
indexes.filter(
index =>
index.created && signatureValid(index) &&
index.deletedFiles.isEmpty && index.appendedFiles.isEmpty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user experience currently is not good either. For example, with hybrid scan off, if you add a file, it will fail in signatureValid without letting the user know. We have been talking about exposing "why not" API that tells the user why an index was not picked up. Until we introduce that kind of API, I think the user experience will remain "not desirable" - meaning not good. 😄

Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments, but LGTM, thanks @apoorvedave1!

@apoorvedave1 apoorvedave1 requested a review from imback82 October 10, 2020 07:21
@imback82 imback82 merged commit 0de4734 into microsoft:master Oct 10, 2020
@imback82
Copy link
Contributor

@apoorvedave1 Can you help creating an issue to address improving the user experience when indexes are not selected (e.g., the "why not" API)? I think we should address this sooner than later (a good candidate for the next release - 0.5). Thanks!

@apoorvedave1 apoorvedave1 deleted the bug_193 branch October 12, 2020 01:24
@rapoth
Copy link
Contributor

rapoth commented Oct 12, 2020

@apoorvedave1 Can you help creating an issue to address improving the user experience when indexes are not selected (e.g., the "why not" API)? I think we should address this sooner than later (a good candidate for the next release - 0.5). Thanks!

@apoorvedave1 Can you take care of creating this issue with the relevant details?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hyperspace should not pick indexes if hybrid scan is disabled and "appendedFiles" or "deletedFiles" in metadata is non-empty

4 participants