Skip to content

feat(cleanup): add more metrics to RemovalStats#6025

Merged
majin1102 merged 6 commits intolance-format:mainfrom
zhangyue19921010:cleanup-result
Mar 5, 2026
Merged

feat(cleanup): add more metrics to RemovalStats#6025
majin1102 merged 6 commits intolance-format:mainfrom
zhangyue19921010:cleanup-result

Conversation

@zhangyue19921010
Copy link
Copy Markdown
Contributor

Add a new metric removed_data_file_num to the RemovalStats of the cleanup operation results,

So that users can easily perceive how many lance data files have been deleted in the current cleanup operation and better evaluate the impact scope of the current cleanup.

@github-actions github-actions Bot added enhancement New feature or request python java labels Feb 26, 2026
@zhangyue19921010 zhangyue19921010 changed the title feat(cleanup): add removed_data_file_num to RemovalStats feat(cleanup): add removed_data_file_num metric to RemovalStats Feb 26, 2026
Copy link
Copy Markdown
Contributor

@majin1102 majin1102 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Left one comment

Comment thread python/src/dataset/cleanup.rs Outdated
pub struct CleanupStats {
pub bytes_removed: u64,
pub old_versions: u64,
pub removed_data_file_num: u64,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start.

Wondering if we could include transaction files, index files, deletion files, etc. And I think removed_data_files might be better. We have similar fields in ManifestSummary. data_files_removed might be another option

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@zhangyue19921010 zhangyue19921010 changed the title feat(cleanup): add removed_data_file_num metric to RemovalStats feat(cleanup): add more metrics to RemovalStats Feb 27, 2026
@zhangyue19921010
Copy link
Copy Markdown
Contributor Author

Hi @majin1102 Thanks a lot for your review. All comments are addressed. PTAL~

Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! I only have a suggestion on naming, other LGTM.

Comment thread python/src/dataset/cleanup.rs Outdated
Comment on lines +22 to +25
pub removed_data_files: u64,
pub removed_transaction_files: u64,
pub removed_index_files: u64,
pub removed_deletion_files: u64,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub removed_data_files: u64,
pub removed_transaction_files: u64,
pub removed_index_files: u64,
pub removed_deletion_files: u64,
pub data_files_removed: u64,
pub transaction_files_removed: u64,
pub index_files_removed: u64,
pub deletion_files_removed: u64,

I suggest we keep the same naing style along with existing bytes_removed.

Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this, LGTM. I'm fine once all CI passed.

@majin1102 majin1102 merged commit db07e77 into lance-format:main Mar 5, 2026
36 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants