Skip to content
This repository was archived by the owner on Jun 14, 2024. It is now read-only.

Conversation

@apoorvedave1
Copy link
Contributor

@apoorvedave1 apoorvedave1 commented Dec 3, 2020

What is the context for this pull request?

What changes were proposed in this pull request?

Set "basePath" option during creation and refresh of indexes on a hive-partitioned data source. This will ensure that all calls to refreshIndex will pick this basePath while identifying newly appended data files for incremental refresh.

When 'basePath' is known, spark will identify the partition columns automatically and properly pick them to store in the index.

NOTE: This PR doesn't include changes which may be required for Deltalake data source. Please follow up with #295 for delta lake

Does this PR introduce any user-facing change?

no

How was this patch tested?

unit tests

Comment on lines 509 to 511
// Check if partition columns are correctly stored in index contents.
val indexDf2 = spark.read.parquet(s"$systemPath/indexName").where("Date = '2019-10-03'")
assert(appendData.size == indexDf2.count())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: this section of the test will fail without the code changes. for the appended data (date = 2019-10-03), the indexDf2.count() will return 0 because "Date = null" for all appended rows).

@apoorvedave1 apoorvedave1 self-assigned this Dec 3, 2020
@apoorvedave1 apoorvedave1 added the bug Something isn't working label Dec 3, 2020
@apoorvedave1 apoorvedave1 added this to the December 2020 milestone Dec 3, 2020
@apoorvedave1 apoorvedave1 requested a review from sezruby December 4, 2020 18:48
@imback82
Copy link
Contributor

imback82 commented Dec 4, 2020

@apoorvedave1 Can you fix the conflicts please?

# Conflicts:
#	src/main/scala/com/microsoft/hyperspace/index/sources/default/DefaultFileBasedSource.scala
#	src/test/scala/com/microsoft/hyperspace/index/IndexManagerTests.scala
@apoorvedave1
Copy link
Contributor Author

@apoorvedave1 Can you fix the conflicts please?

@imback82 , thanks , done

# Conflicts:
#	src/main/scala/com/microsoft/hyperspace/index/rules/RuleUtils.scala
@apoorvedave1
Copy link
Contributor Author

@imback82 could you please take a look when you get a chance?

Copy link
Collaborator

@sezruby sezruby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for one #281 (comment), LGTM Thanks @apoorvedave1!

And please handle Delta Lake #295 asap as options in lowercase can cause a bug.

@apoorvedave1
Copy link
Contributor Author

Except for one #281 (comment), LGTM Thanks @apoorvedave1!

And please handle Delta Lake #295 asap as options in lowercase can cause a bug.

thank you. I will see if I get the bw. But please feel free to start a PR if you get a chance.

@imback82
Copy link
Contributor

imback82 commented Jan 4, 2021

Store "basePath" in metadata while creating the index on a hive-partitioned data source.

Can you update this? (I think we are setting basePath option when creating a relation). I misunderstood it as updating the index metadata to store this info.

@apoorvedave1
Copy link
Contributor Author

Store "basePath" in metadata while creating the index on a hive-partitioned data source.

Can you update this? (I think we are setting basePath option when creating a relation). I misunderstood it as updating the index metadata to store this info.

thanks @imback82 , yeah that was the original implementation. It changed during the review. I have updated the description

imback82
imback82 previously approved these changes Jan 12, 2021
Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @apoorvedave1!

@sezruby Can you also finish reviewing this PR?

@sezruby
Copy link
Collaborator

sezruby commented Jan 12, 2021

@imback82 one comment: #281 (comment)
but it seems minor. Let's merge this PR :) Thanks @apoorvedave1!

@imback82
Copy link
Contributor

Ah OK. I missed that comment since it was resolved. @apoorvedave1 Can you update it with flatten approach? That seems cleaner. Thanks!

@apoorvedave1
Copy link
Contributor Author

apoorvedave1 commented Jan 12, 2021

Ah OK. I missed that comment since it was resolved. @apoorvedave1 Can you update it with flatten approach? That seems cleaner. Thanks!

done, thanks @imback82 !

@imback82 imback82 merged commit 117a070 into microsoft:master Jan 12, 2021
@imback82 imback82 modified the milestones: December 2020, January 2021 Jan 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants