Up to 20x : Binary Copy Capability For Compaction

# Overview
In Lance’s current compaction process, the data files of the fragments to be merged are fully decompressed and deserialized into Arrow format, then re-serialized and compressed to produce a few larger fragments. However, in the simple “merge small files” scenario, we can directly copy the pages within data files at the binary level (binary copy) and reconstruct the metadata to form a larger fragment. This bypasses page data parsing and significantly improves the performance of small-file merging.
Note that binary copy does not inspect data distribution within pages and therefore cannot handle fragments that contain a deletion vector. Consequently, during compaction, if a fragment is found to include a deletion file, the process should fall back to the original full read/write compaction approach.

<img width="4660" height="2054" alt="Image" src="https://github.com/user-attachments/assets/385a19c0-2684-4404-ae8f-dda3ce2e7885" />

# Design
## High-Level Flow

<img width="3056" height="746" alt="Image" src="https://github.com/user-attachments/assets/047acae6-d0ec-4f51-91eb-37aa763c5d9e" />

When compact_files is executed, it will determine whether this optimization path can be used through the can_use_binary_copy() function. It will only be triggered when all conditions are met. The specific conditions are as follows:
• Feature switch: `CompactionOptions.enable_binary_copy` must be set to true. This is a master switch, which is turned off by default to ensure backward compatibility and stability.
• Version constraints: All data files (*.lance) contained in all Fragments to be merged must have exactly the same file versions (major/minor versions). For example, a file written with version v2.0 cannot be merged with another file written with version v2.1.
• No deletion files: None of the Fragments to be merged can have associated deletion files (deletion_file). The Binary Copy path itself is not responsible for implementing the materialization of deleted rows; it only handles data transfer.
## Binary Copy Details

<img width="343" height="622" alt="Image" src="https://github.com/user-attachments/assets/3ec4eb91-569f-49be-90b0-ee8100221295" />

The core logic resides in `rewrite_files_binary_copy`. It operates at the page and buffer level, stitching together new data files from the pieces of the old ones.
- **I/O Batching**: To optimize I/O, the algorithm reads data in large, coalesced batches. It iterates through the pages of each column in each source file, grouping ranges of buffers to be read in a single request. The size of these batches is controlled by `binary_copy_read_batch_bytes`.
- **64-Byte Alignment**: All data copied to the new file is aligned to a 64-byte boundary. This is a requirement for modern Lance file versions (v2.1+) and ensures optimal I/O performance. Padding bytes are inserted as needed.
- **Column-Level Buffer Copying**: In addition to page data, any column-level buffers (metadata or dictionaries stored outside of pages) are also copied with alignment.
- **Footer Finalization**: After all data is copied to a new file, the footer is written using the standard `FileWriter::finish()` method. This function takes the accumulated column metadata, page tables, and total row count, and writes the final manifest block, including schema, offset tables, and version information. This ensures the output file is a valid and well-formed Lance file.
## Row IDs and Index Interactions
- **Stable Row IDs**:
	- **When enabled**: If the dataset has stable_row_ids enabled, the Binary Copy function is responsible for inheriting the row ID sequences (RowIdSequence) from the old Fragments without any changes. It reads the row IDs of all old Fragments, then rechunks them according to the row count boundaries of the new Fragments, and stores the new row ID sequences either inline or externally in the metadata of the new Fragments.
	- **When disabled**: If the dataset uses row IDs based on addresses (fragment_id + offset). Lance must generate a complete mapping from every old row ID to its new row ID. For binary copy, Since it has already been ensured earlier that the fragments being processed do not have delete files, it is only necessary to simulate a row_id_rx, which consists of the fragment id and row size.

# Compatibility Issues
Brand-new features with rollback capability, free from historical compatibility issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Up to 20x : Binary Copy Capability For Compaction #5433

Overview

Design

High-Level Flow

Binary Copy Details

Row IDs and Index Interactions

Compatibility Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Up to 20x : Binary Copy Capability For Compaction #5433

Description

Overview

Design

High-Level Flow

Binary Copy Details

Row IDs and Index Interactions

Compatibility Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions