Skip to content

Up to 20x : Binary Copy Capability For Compaction #5433

@zhangyue19921010

Description

@zhangyue19921010

Overview

In Lance’s current compaction process, the data files of the fragments to be merged are fully decompressed and deserialized into Arrow format, then re-serialized and compressed to produce a few larger fragments. However, in the simple “merge small files” scenario, we can directly copy the pages within data files at the binary level (binary copy) and reconstruct the metadata to form a larger fragment. This bypasses page data parsing and significantly improves the performance of small-file merging.
Note that binary copy does not inspect data distribution within pages and therefore cannot handle fragments that contain a deletion vector. Consequently, during compaction, if a fragment is found to include a deletion file, the process should fall back to the original full read/write compaction approach.

Image

Design

High-Level Flow

Image

When compact_files is executed, it will determine whether this optimization path can be used through the can_use_binary_copy() function. It will only be triggered when all conditions are met. The specific conditions are as follows:
• Feature switch: CompactionOptions.enable_binary_copy must be set to true. This is a master switch, which is turned off by default to ensure backward compatibility and stability.
• Version constraints: All data files (*.lance) contained in all Fragments to be merged must have exactly the same file versions (major/minor versions). For example, a file written with version v2.0 cannot be merged with another file written with version v2.1.
• No deletion files: None of the Fragments to be merged can have associated deletion files (deletion_file). The Binary Copy path itself is not responsible for implementing the materialization of deleted rows; it only handles data transfer.

Binary Copy Details

Image

The core logic resides in rewrite_files_binary_copy. It operates at the page and buffer level, stitching together new data files from the pieces of the old ones.

  • I/O Batching: To optimize I/O, the algorithm reads data in large, coalesced batches. It iterates through the pages of each column in each source file, grouping ranges of buffers to be read in a single request. The size of these batches is controlled by binary_copy_read_batch_bytes.
  • 64-Byte Alignment: All data copied to the new file is aligned to a 64-byte boundary. This is a requirement for modern Lance file versions (v2.1+) and ensures optimal I/O performance. Padding bytes are inserted as needed.
  • Column-Level Buffer Copying: In addition to page data, any column-level buffers (metadata or dictionaries stored outside of pages) are also copied with alignment.
  • Footer Finalization: After all data is copied to a new file, the footer is written using the standard FileWriter::finish() method. This function takes the accumulated column metadata, page tables, and total row count, and writes the final manifest block, including schema, offset tables, and version information. This ensures the output file is a valid and well-formed Lance file.

Row IDs and Index Interactions

  • Stable Row IDs:
    • When enabled: If the dataset has stable_row_ids enabled, the Binary Copy function is responsible for inheriting the row ID sequences (RowIdSequence) from the old Fragments without any changes. It reads the row IDs of all old Fragments, then rechunks them according to the row count boundaries of the new Fragments, and stores the new row ID sequences either inline or externally in the metadata of the new Fragments.
    • When disabled: If the dataset uses row IDs based on addresses (fragment_id + offset). Lance must generate a complete mapping from every old row ID to its new row ID. For binary copy, Since it has already been ensured earlier that the fragments being processed do not have delete files, it is only necessary to simulate a row_id_rx, which consists of the fragment id and row size.

Compatibility Issues

Brand-new features with rollback capability, free from historical compatibility issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions