Skip to content

Collect additional metadata while writing manifets #733

@aokolnychyi

Description

@aokolnychyi

In order to avoid reading manifests to get stats in #675, we need to collect additional metadata when writing manifests. The minimum required information (num added/deleted records) can be gathered easily. The main question is how to store it.

Option 1

We can store the new metadata as we store the number of added/deleted/existing files.

Benefits:

  • We can always retrieve the additional information by reading manifests.

Drawbacks:

  • Affects what we actually store on disk.
  • Potentially increases the metadata size.

Option 2

We can introduce a serializable field of type Map<String, String> but don't store that on disk. Only instances of ManifestFile created via ManifestWriter will always contain all needed properties.

Benefits:

  • The metadata on disk doesn't change.
  • The metadata size stays the same.

Drawbacks

  • We cannot retrieve the additional information by reading manifests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions