-
Notifications
You must be signed in to change notification settings - Fork 116
Introduce IndexHadoopFsRelation to show applied index name & version in query plan #323
Conversation
22b1688 to
f7ea5a7
Compare
src/main/scala/com/microsoft/hyperspace/index/plans/logical/IndexHadoopFsRelation.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/index/plans/logical/IndexHadoopFsRelation.scala
Show resolved
Hide resolved
imback82
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally seems fine to me. But I need an approval from @apoorvedave1 around the id changes.
src/main/scala/com/microsoft/hyperspace/actions/OptimizeAction.scala
Outdated
Show resolved
Hide resolved
| /** | ||
| * Wrapper class of HadoopFsRelation to indicate index application more explicitly in Plan string. | ||
| */ | ||
| class IndexHadoopFsRelation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason for choosing class over case class, same as HadoopFsRelation (since it's a wrapper, I would think both are of same type)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because case-to-case inheritance is not proper in Scala.
src/main/scala/com/microsoft/hyperspace/index/plans/logical/IndexHadoopFsRelation.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/CreateAction.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/OptimizeAction.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/OptimizeAction.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/RefreshAction.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/actions/RefreshIncrementalAction.scala
Outdated
Show resolved
Hide resolved
| def logVersion: String = { | ||
| properties.getOrElse(IndexConstants.INDEX_LOG_VERSION_PROPERTY, "undefined") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of introducing a key in the map, can't we simply use super.id directly here?
i mean
def logVersion: String = id.toString
If we do this, we will not even need to introduce endId() def in Actions.scala class
Action.begin(),
Action.end() set this id themselves and it's always up to date with the actual file name of the log entry file.
cc @imback82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I updated this PR to use LogEntry.id.
Btw I added endId in #272 as the id is required in op().. @apoorvedave1 Could you also have a look at the change?
imback82
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @sezruby!
What is the context for this pull request?
What changes were proposed in this pull request?
Introduce a wrapper class of HadoopFsRelation for explicit plan transformation with index, in plan string.
When applying index, we could use IndexHadoopFsRelation instead of HadoopFsRelation as an identifier.
Instead of "parquet", print s"Index $indexName_$indexLogVersion".
to
Does this PR introduce any user-facing change?
Yes, plan with index changed as following:
current sparkPlan:
SparkPlan with this PR
Logical Plan with this PR
How was this patch tested?
Unit test