-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
Compaction is essential for maintaining high performance and storage efficiency in modern data systems. Key benefits include:
- For Append Tables: Reduces small files by merging existing data files, improving scan performance and metadata scalability.
- For Primary Key (PK) Tables: Minimizes the number of segments that need to be merged during read-time (
merge-on-read), significantly speeding up queries. - For PK+DV Tables: Enables writing DV (
Delete Vector) files to mark outdated rows, allowing efficient read performance.
Currently, the lack of a dedicated compaction mechanism limits our ability to optimize storage layout and query latency.
Solution
The compaction framework should support the following capabilities:
- Support for both append tables and primary key (PK) tables, with appropriate strategies for each;
- Execution via background tasks or manual triggers, allowing flexibility in operation;
- Built-in basic compaction policies aligned with Java Paimon;
- Generation of Delete Vector (DV) files during/after compaction to track stale rows;
- Design support for data-evolution scenarios, including both vertical compaction (merging small files) and horizontal compaction (consolidating partial-column files);
- Ensure output data format is fully compatible with Java Paimon.
Anything else?
No response
Are you willing to submit a PR?
- I'm willing to submit a PR!
ChaomingZhangCN
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request