Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
5c70543
feat: binary copy for compaction
zhangyue19921010 Dec 8, 2025
79dc823
feat: binary copy for compaction
zhangyue19921010 Dec 8, 2025
36d0a9c
feat: binary copy for compaction
zhangyue19921010 Dec 8, 2025
a7711f8
feat: binary copy for compaction
zhangyue19921010 Dec 8, 2025
a5bd940
feat: binary copy for compaction
zhangyue19921010 Dec 8, 2025
e0bc0f9
feat: binary copy for compaction
zhangyue19921010 Dec 8, 2025
c683974
feat: binary copy for compaction
zhangyue19921010 Dec 9, 2025
0e6d74b
code review
zhangyue19921010 Dec 11, 2025
041a79b
code review
zhangyue19921010 Dec 11, 2025
cf46643
code review
zhangyue19921010 Dec 11, 2025
b3a748e
Merge branch 'main' into binary-copy-final
zhangyue19921010 Dec 11, 2025
9646e51
code review
zhangyue19921010 Dec 11, 2025
582074a
code review
zhangyue19921010 Dec 11, 2025
a352b16
bug fix
zhangyue19921010 Dec 16, 2025
2faa848
bug fix
zhangyue19921010 Dec 16, 2025
c8255ff
bug fix
zhangyue19921010 Dec 17, 2025
f9d10b0
Merge branch 'main' into binary-copy-final
zhangyue19921010 Dec 18, 2025
ab78372
Verify the consistency of column buffer encoding
zhangyue19921010 Dec 22, 2025
5b74fb7
code review
zhangyue19921010 Jan 5, 2026
d35a130
code review
zhangyue19921010 Jan 6, 2026
902839d
code review
zhangyue19921010 Jan 13, 2026
651a52c
Merge branch 'main' into binary-copy-final
zhangyue19921010 Jan 13, 2026
1f7c558
code review
zhangyue19921010 Jan 13, 2026
7492c4d
code review
zhangyue19921010 Jan 13, 2026
9f0ac0f
code review
zhangyue19921010 Jan 13, 2026
ed5aed2
Merge branch 'binary-copy-final-bak-0113' into binary-copy-final
zhangyue19921010 Jan 13, 2026
7c396ab
code review
zhangyue19921010 Jan 29, 2026
2ece222
Merge branch 'main' into binary-copy-final
zhangyue19921010 Jan 29, 2026
65befe6
code review
zhangyue19921010 Jan 29, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion rust/lance-file/src/writer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -495,6 +495,26 @@ impl FileWriter {
self.schema_metadata.insert(key.into(), value.into());
}

/// Prepare the writer when column data and metadata were produced externally.
///
/// This is useful for flows that copy already-encoded pages (e.g., binary copy
/// during compaction) where the column buffers have been written directly and we
/// only need to write the footer and schema metadata. The provided
/// `column_metadata` must describe the buffers already persisted by the
/// underlying `ObjectWriter`, and `rows_written` should reflect the total number
/// of rows in those buffers.
pub fn initialize_with_external_metadata(
&mut self,
schema: lance_core::datatypes::Schema,
column_metadata: Vec<pbfile::ColumnMetadata>,
rows_written: u64,
) {
self.schema = Some(schema);
self.num_columns = column_metadata.len() as u32;
self.column_metadata = column_metadata;
self.rows_written = rows_written;
}
Comment thread
zhangyue19921010 marked this conversation as resolved.

/// Adds a global buffer to the file
///
/// The global buffer can contain any arbitrary bytes. It will be written to the disk
Expand Down Expand Up @@ -595,7 +615,9 @@ impl FileWriter {
.collect::<FuturesOrdered<_>>();
self.write_pages(encoding_tasks).await?;

self.finish_writers().await?;
if !self.column_writers.is_empty() {
self.finish_writers().await?;
}
Comment on lines +618 to +620
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Binary Copy, we pre-write data and metadata in a copy manner without calling FileWriter. However, when flushing the footer, in order to reuse existing code as much as possible, we will try to mock a file writer and call its finish method to trigger the writing of the footer. Therefore, a check is needed here to skip the execution of finish_writers in scenarios similar to Binary copy (where column_writers is empty).


// 3. write global buffers (we write the schema here)
let global_buffer_offsets = self.write_global_buffers().await?;
Expand Down
Loading