Skip to content

feat(python): support shallow_clone#4949

Merged
jackye1995 merged 22 commits intolance-format:mainfrom
majin1102:python-shallow_clone
Oct 20, 2025
Merged

feat(python): support shallow_clone#4949
jackye1995 merged 22 commits intolance-format:mainfrom
majin1102:python-shallow_clone

Conversation

@majin1102
Copy link
Copy Markdown
Contributor

Close #4856

@github-actions github-actions Bot added enhancement New feature or request python labels Oct 14, 2025
@majin1102 majin1102 force-pushed the python-shallow_clone branch from 8326f30 to 81cdcdc Compare October 14, 2025 13:51
@majin1102 majin1102 marked this pull request as draft October 14, 2025 16:13
@majin1102 majin1102 force-pushed the python-shallow_clone branch from f18f27e to 1eeb208 Compare October 14, 2025 17:18
@majin1102 majin1102 force-pushed the python-shallow_clone branch from 1eeb208 to e66f684 Compare October 14, 2025 17:33
@majin1102 majin1102 force-pushed the python-shallow_clone branch 3 times, most recently from 62f77eb to 6acad63 Compare October 15, 2025 07:48
@majin1102 majin1102 marked this pull request as ready for review October 15, 2025 07:48
@majin1102
Copy link
Copy Markdown
Contributor Author

Ready for review @jackye1995

@majin1102 majin1102 force-pushed the python-shallow_clone branch from 924deb2 to fa61802 Compare October 16, 2025 10:15
Comment thread python/python/lance/dataset.py Outdated
----------
target_path : str or Path
The URI or filesystem path to clone the dataset into.
version : int or str
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not up to date with the Union[int, str, Tuple[int, str]]

Comment thread python/python/lance/dataset.py Outdated

new_inner = self._ds.shallow_clone(target_uri, version, storage_options)

ds = LanceDataset.__new__(LanceDataset)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just directly load a new dataset, in stead of overriding these parameters? It feels fragile if we add anything new to LanceDataset that needs override.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable.

So we just ignore the shallow_clone returned dataset right? the manifest is cached anyway. Just make sure this is noticed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have some static method like LanceDataset.clone_from to do it, but I feel it is not worth the complexity, but let me know if you disagree.

But this reminds me that, we should add a kwarg so it can pass in whatever other parameters (e.g. read_params, default_scan_options) that the user wants to open the cloned dataset.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, PTAL when you have time

@jackye1995 jackye1995 closed this Oct 20, 2025
@jackye1995 jackye1995 reopened this Oct 20, 2025
@jackye1995
Copy link
Copy Markdown
Contributor

Error unrelated to this PR and exists on main branch, merging

@jackye1995 jackye1995 merged commit 96de89d into lance-format:main Oct 20, 2025
14 of 32 checks passed
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
Close lance-format#4856

---------

Co-authored-by: majin.nathan <majin.nathan@bytedance.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support shallow_clone in Python

2 participants