Skip to content

feat(blob_v2): add GC support#5473

Merged
Xuanwo merged 2 commits intomainfrom
xuanwo/blobv2-gc
Dec 16, 2025
Merged

feat(blob_v2): add GC support#5473
Xuanwo merged 2 commits intomainfrom
xuanwo/blobv2-gc

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Dec 15, 2025

This PR adds GC support for blob v2.

Part of #4947

All blob files (whether packed or dedicated) follow the same naming pattern: data/{data_file_key}/{blob_id:08x}.blob. These files are only referenced by data/{data_file_key}.lance. Therefore, we can implement GC support by checking whether the related data file is still in use.

External blobs are not being considered at this stage.


Parts of this PR were drafted with assistance from Codex (with gpt-5.2) and fully reviewed and edited by me. I take full responsibility for all changes.

@github-actions github-actions Bot added the enhancement New feature or request label Dec 15, 2025
@Xuanwo Xuanwo mentioned this pull request Dec 15, 2025
9 tasks
@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@Xuanwo Xuanwo changed the title feat(blob_v2): Add GC support feat(blob_v2): add GC support Dec 15, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 15, 2025

Codecov Report

❌ Patch coverage is 86.42857% with 19 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/cleanup.rs 86.42% 15 Missing and 4 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Only suggestion is adding some logging (we should probably have this logging in the other cleanup spots too but better late than never)

Comment thread rust/lance/src/dataset/cleanup.rs
Comment thread rust/lance/src/dataset/cleanup.rs
Comment thread rust/lance/src/dataset/cleanup.rs
Comment thread rust/lance/src/dataset/cleanup.rs
@Xuanwo Xuanwo merged commit 671285f into main Dec 16, 2025
27 checks passed
@Xuanwo Xuanwo deleted the xuanwo/blobv2-gc branch December 16, 2025 15:21
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
This PR adds GC support for blob v2.

Part of lance-format#4947

All blob files (whether packed or dedicated) follow the same naming
pattern: `data/{data_file_key}/{blob_id:08x}.blob`. These files are only
referenced by `data/{data_file_key}.lance`. Therefore, we can implement
GC support by checking whether the related data file is still in use.

External blobs are not being considered at this stage.

---

**Parts of this PR were drafted with assistance from Codex (with
`gpt-5.2`) and fully reviewed and edited by me. I take full
responsibility for all changes.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants