Skip to content
This repository was archived by the owner on Jun 14, 2024. It is now read-only.
This repository was archived by the owner on Jun 14, 2024. It is now read-only.

Hive-partition columns are not correctly stored in index content when refresh-incremental is called #280

@apoorvedave1

Description

@apoorvedave1

Describe the issue

When incremental refresh is called on hive-partitioned data, if partitoin columns are part of index columns (indexed/included), they are not picked by the refresh call and are filled with nulls.

To Reproduce

  1. create hive partitioned data. e.g. df.write.partitionBy("c1").parquet...
  2. create index where c1 is used in either indexed or included columns
  3. append new data to source
  4. hs.refresh("index", "incremental")
  5. check index data. 'c1' column will be filled with 'nulls' for all appended data.

Expected behavior

'c1' column should contain proper partition values instead of nulls

Metadata

Metadata

Assignees

Labels

untriagedThis is the default tag for a newly created issue

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions