Skip to content

feat(python): support cleanup_with_policy#5458

Merged
wjones127 merged 7 commits intolance-format:mainfrom
ddupg:feat/py-cleanup-policy
Dec 16, 2025
Merged

feat(python): support cleanup_with_policy#5458
wjones127 merged 7 commits intolance-format:mainfrom
ddupg:feat/py-cleanup-policy

Conversation

@ddupg
Copy link
Copy Markdown
Contributor

@ddupg ddupg commented Dec 11, 2025

No description provided.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@github-actions github-actions Bot added enhancement New feature or request python labels Dec 11, 2025
@ddupg ddupg force-pushed the feat/py-cleanup-policy branch from f73bd53 to 2f20e78 Compare December 11, 2025 09:39
Copy link
Copy Markdown
Contributor

@majin1102 majin1102 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment, PTAL when you have time

Comment thread python/python/lance/lance/__init__.pyi Outdated
@ddupg ddupg force-pushed the feat/py-cleanup-policy branch from 6320c16 to 5447cfe Compare December 11, 2025 13:05
Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a question, as this seems like it might be overcomplicating things. There's only one new parameter, right?

Comment thread python/python/lance/dataset.py Outdated
td_to_micros(older_than), delete_unverified, error_if_tagged_old_versions
)

def cleanup_with_policy(self, policy: CleanupPolicy) -> CleanupStats:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just add the new retain_versions parameter to cleanup_old_versions, why a whole new method?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @wjones127 for your attention.

Why not just add the new retain_versions parameter to cleanup_old_versions, why a whole new method?

Just want to align with rust cleanup_with_policy, I can add new parameter retain_versions to cleanup_old_versions, but is it okay if they differ from the Rust method?

Or should we first merge rust cleanup_old_versions and cleanup_with_policy into one?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s fine if the API design differs between Python and Rust. Python has named arguments, which is often a natural API design for Python users. Rust doesn’t, and instead needs to use option structs and builder patterns to make configuration readable.

It seems like this should just be another named argument in Python.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this should just be another named argument in Python.

Thank @wjones127 for your suggestion. I added retain_versions named argument in python cleanup_old_versions

@wjones127 wjones127 self-assigned this Dec 12, 2025
Change-Id: Ic0170a9d7cbc8842e6ea7a4a5cb3981407883d3f
Change-Id: I457317aef988763e1120fb0684090d55197ad760
Change-Id: I2f1abae0a422b5f0b156ef343983777efad5b3d3
Change-Id: I9e81cd76a4f8e6c95fe8545c813730e185fcf13d
Change-Id: Ia9de9733eb565aaafbc13dc559fbe7019901ed9e
@ddupg ddupg force-pushed the feat/py-cleanup-policy branch 2 times, most recently from 166a788 to f17cb57 Compare December 15, 2025 12:10
Change-Id: Id4627874a1f2aa85ca2eed9323cbc2d0c8b9fbfe
@ddupg ddupg force-pushed the feat/py-cleanup-policy branch from f17cb57 to 446cf8c Compare December 15, 2025 12:21
time.sleep(0.05)
lance.write_dataset(table, base_dir, mode="overwrite")
moment = datetime.now()
time.sleep(0.05)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the sleep calls necessary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is to verify the combined effect of older_than and retain_versions, the expected outcome should satisfy both constraints simultaneously. To avoid timing skew affecting the older_than evaluation and causing flakiness, I add sleep calls to stabilize the boundary conditions.

Comment thread python/python/tests/test_dataset.py Outdated
Co-authored-by: Will Jones <willjones127@gmail.com>
@wjones127 wjones127 merged commit 855e70a into lance-format:main Dec 16, 2025
12 checks passed
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
Co-authored-by: Will Jones <willjones127@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants