Skip to content

[Feature] Don't store redundant primary key columns #2893

@zhongyujiang

Description

@zhongyujiang

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

I found that Paimon will redundantly store the primary key columns in files with a name prefixed with _KEY_, this increase the query overhead and storage size. Especially when there are several primary key fields, like 5 or even more.

Is it possible to directly use the original column values to be projected as the primary key during query? I think that can reduce the load of reading and writing, as well as the storage volume.

Solution

Generate primary key values directly from original columns instead of use redundant primary key columns.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions