-
Notifications
You must be signed in to change notification settings - Fork 116
Introduce IndexLogEntryTag for auxiliary data while applying rules #223
Conversation
|
@imback82 @pirz @apoorvedave1 Please review this change when you have the time. Thanks! |
|
Sorry for the delay. I will try to get to this this weekend. |
src/main/scala/com/microsoft/hyperspace/index/rules/RuleUtils.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/index/rules/RuleUtils.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/index/rules/RuleUtils.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/rules/RuleUtilsTest.scala
Outdated
Show resolved
Hide resolved
imback82
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Tags are used while applying rules. | ||
| // INDEX_HYBRIDSCAN_REQUIRED_TAG indicates if Hybrid Scan is required for this index or not. | ||
| // This is set in getCandidateIndexes and utilized in transformPlanToUseIndex. | ||
| val INDEX_HYBRIDSCAN_REQUIRED_TAG: IndexLogEntryTag[Boolean] = | ||
| IndexLogEntryTag[Boolean]("hybridScanRequired") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is getting out of control with random constants. Can you create IndexLogEntryTags and put tag related constants there? I should be able to use as IndexLogEntryTags.HYBRID_SCAN_REQUIRED, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imback82 Could you check the change and merge this change? Thanks!
imback82
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for few nits, thanks @sezruby!
src/main/scala/com/microsoft/hyperspace/index/IndexLogEntry.scala
Outdated
Show resolved
Hide resolved
| object IndexLogEntryTags { | ||
| // INDEX_HYBRIDSCAN_REQUIRED_TAG indicates if Hybrid Scan is required for this index or not. | ||
| // This is set in getCandidateIndexes and utilized in transformPlanToUseIndex. | ||
| val INDEX_HYBRIDSCAN_REQUIRED_TAG: IndexLogEntryTag[Boolean] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just call this HYBRIDSCAN_REQUIRED? IndexLogEntryTags.HYBRIDSCAN_REQUIRED sounds fine?
src/main/scala/com/microsoft/hyperspace/index/rules/RuleUtils.scala
Outdated
Show resolved
Hide resolved
|
@sezruby Can you update the description with the lifetime of tags with the current code? (e.g., per rule) |
|
@imback82 Updated pr description. Thanks! |
The index manager could be |
|
Sorry I don't know why I thought it's "cloned"; this's the reason I didn't use "df" as a part of key of the map at first. 😖 Then, should we add "max number of caching entires" or is it enough to rely on INDEX_CACHE_EXPIRY_DURATION_SECONDS (default 300s)? Or even we could reuse the index log entry (with tags) if there's no update on the index. |
Great. As long as the cached tags do not affect other rules, it's good for now.
Yea, let's think about this separately. Merging to master! |
What is the context for this pull request?
This PR is one possible implementation of the first approach in #222.
What changes were proposed in this pull request?
In order to optimize rule application process, this PR introduces IndexLogEntryTag which can store auxiliary
data while applying rules and defines a tag for Hybrid Scan:
Tags are stored in each
IndexLogEntryinstances which can be accessed byindexManager.getIndexes.Therefore the lifetime of each tag is the same of each index instance, which means if the indexManager returns cached index log entries, tags are available and reusable for other queries while the cached log entries are available.
ref) tag implementation from Spark 3.0 - https://github.com/apache/spark/blob/eb9966b70055a67dd02451c78ec205d913a38a42/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala#L93
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test