Skip to content

[Feature] Support tag #1269

@yuzelin

Description

@yuzelin

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Paimon's snapshots can support to query historical data without cost of merging. Paimon's data storage will generate snapshot for each commit, so we can find historical data at any snapshot (usually called time travel query in SQL).

But in most scenarios, a table will generate too many snapshots, so Paimon lets users to configure snapshot expiration time to clean old, unused snapshots. Thus the snapshot of a specified time point may have expired when user want to query it.

To solve this problem, we propose to introduce a new mechanism Tag. A tag is created from snapshot and can keep longer. The purposes of tag are:

  • Fault recovery (or we can say disaster recovery). Users can ROLL BACK to a tag if needed. If user rollbacks to a tag, the table will hold the data in the tag and the data committed after the tag will be deleted. In this scenario the tag is like backup, but more lightweight because the tag only record the metadata of data files instead of copying them.

  • Record versions of data at a longer interval (typically daily or weekly). With tag, user can query the old data in batch mode. In this scenario the tag is like long-lived snapshot.

Solution

Design: https://cwiki.apache.org/confluence/x/NxE0Dw

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions