feat(python): support namespace for tensorflow#5750
feat(python): support namespace for tensorflow#5750wjones127 merged 2 commits intolance-format:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds namespace support to the TensorFlow integration (lance.tf.data.from_lance), enabling users to specify tables using namespace and table_id instead of direct URIs. This brings the TensorFlow API in line with the core lance.dataset() function's namespace capabilities.
Changes:
- Added
namespace,table_id, andignore_namespace_table_storage_optionsparameters tofrom_lance() - Made the
datasetparameter optional to support namespace-based table resolution - Added validation to prevent mixing LanceDataset instances with namespace/table_id parameters
- Added comprehensive test coverage for the new functionality
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| python/python/lance/tf/data.py | Added namespace support parameters to from_lance() function, updated documentation, and implemented parameter forwarding to lance.dataset() |
| python/python/tests/test_tf.py | Added test to verify namespace and table_id parameters are correctly forwarded and functional |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if isinstance(dataset, LanceDataset): | ||
| if namespace is not None or table_id is not None: | ||
| raise ValueError( | ||
| "Cannot specify 'namespace' or 'table_id' when passing a LanceDataset instance" | ||
| ) | ||
| else: | ||
| dataset = lance.dataset( | ||
| dataset, | ||
| namespace=namespace, | ||
| table_id=table_id, | ||
| ignore_namespace_table_storage_options=ignore_namespace_table_storage_options, | ||
| ) |
There was a problem hiding this comment.
Missing validation to ensure that either dataset or both namespace and table_id are provided. When all three parameters are None, the function will pass None to lance.dataset() which will raise an error, but the error message will mention 'uri' parameter which could be confusing since this function uses 'dataset' as the parameter name. Consider adding explicit validation here with a clear error message that matches the parameter names used in this function.
wjones127
left a comment
There was a problem hiding this comment.
Looks good. I think there are improvements to the API we can make as a follow up once we adjust lance.dataset.
| def from_lance( | ||
| dataset: Union[str, Path, LanceDataset], | ||
| dataset: Optional[Union[str, Path, LanceDataset]] = None, |
There was a problem hiding this comment.
I find the new API a little clunky, but I recognize you are just matching what is is in lance.dataset. I created an issue for improving this API: #5809
|
Test failures are unrelated. |
Co-authored-by: Will Jones <willjones127@gmail.com>
No description provided.