Skip to content
This repository was archived by the owner on Jun 14, 2024. It is now read-only.
This repository was archived by the owner on Jun 14, 2024. It is now read-only.

RefreshAppend should update appended/deleted files list in metadata #183

@apoorvedave1

Description

@apoorvedave1

Describe the issue

Prerequisite PR: PR #170
(PR #170 updates IndexLogEntry and adds appended property to source properties).

The above mentioned PR will introduce appended and deleted list of files in metadata. The current RefreshAppendAction, as per this ongoing PR #163 , does not take into account these metadata values yet. Make sure to update these values as required:

  1. clean up appended files if they were picked for index creation in RefreshAppendAction
  2. If deleted list remains untouched, keep the deleted files list as it is in the new log entry.

To Reproduce

  1. User creates an index on some data e.g., "/path/to/dataset/".
  2. User deletes some files from the original data and adds some new files under "/path/to/dataset/".
  3. User sets spark.hyperspace.index.refresh.append.enabled flag to true and calls refresh to update index by creating index on appended files.
    Once refresh finishes successfully, a newer version of index is created and latest metadata shows updated signature and empty appended/deleted files. Make sure the deleted files list is same as previous log entry.

Expected behavior

After refresh: appended and deleted files list in index metadata is correctly handled.

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    untriagedThis is the default tag for a newly created issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions