feat: support namespace as external manifest store#5968
feat: support namespace as external manifest store#5968jackye1995 merged 23 commits intolance-format:mainfrom
Conversation
PR ReviewP0: Build will break - Invalid dependency pathCargo.toml:69 changes lance-namespace-reqwest-client to a local path outside the repository (../lance-namespace/rust/lance-namespace-reqwest-client). This will fail for anyone without a sibling lance-namespace directory. P1: Race condition in create_table_versionrust/lance-namespace-impls/src/dir.rs:693-721 has a TOCTOU race condition between checking if the version exists and copying the manifest. This defeats the put_if_not_exists semantics. Consider using copy_if_not_exists or similar atomic operation. P1: put_if_exists contradicts implementationrust/lance/src/io/commit/namespace_manifest.rs:114-133 - The put_if_exists method calls create_table_version, but create_table_version rejects existing versions. This needs a separate update API or reconciled semantics. Summary
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
a411f3d to
8311d09
Compare
| /// implement a staging-based workflow. Implementations that can write directly | ||
| /// (e.g., namespace-backed stores) should override this method. | ||
| #[allow(clippy::too_many_arguments)] | ||
| async fn put( |
There was a problem hiding this comment.
this is probably a good middle ground between external manifest store today vs doing a whole new manifest store. We will have a single generic put to put a manifest to the store that can be overridden. It gives more freedom compared to having to follow exactly the same flow with put_if_not_exist and then put_if_exists
|
|
||
| @pytest.mark.skipif( | ||
| sys.platform == "win32", | ||
| reason="External manifest store has known issues on Windows", |
There was a problem hiding this comment.
looks like external manifest store code path in general has compatibility issue with windows, skip related tests for now and fix in a separated PR
Without this fix, if user directly use the native table to do operations like `add_columns`, even if it is configured to use namespace db connection, it is not really propagated through. The fix is to bring lancedb's python binding up to date and do a similar implementation as lance-format/lance#5968, and make sure the namespace is fully propagated through all the related calls. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This PR is based on discussion in #5849
I have added 4 APIs in Lance Namespace spec:
descendingto allow return versions in descending order)With these APIs, we can fulfill the
ExternalManfiestStoretrait with aLanceNamespaceExternalManfiestStoreby:limit=1,descending=falseput_if_not_existandput_if_exists)We also added a basic implementation in
DirectoryNamespacewith:In
DirectoryNamespace, we added a flagtable_version_tracking_enabled. When true, when user callsdescribe_tableordeclare_tableto get table information, we return a new flag in response sayingmanaged_versioning=true, indicating that the namespace should manage table version commit and resolution. Based on that information, we set the commit handler to be external manifest store commit handler, and the external manifest store to beLanceNamespaceExternalManfiestStore.