Skip to content

refactor: use lance-io object store for dir namespace and improve builder#5045

Merged
jackye1995 merged 6 commits intolance-format:mainfrom
jackye1995:arrow-dir
Oct 23, 2025
Merged

refactor: use lance-io object store for dir namespace and improve builder#5045
jackye1995 merged 6 commits intolance-format:mainfrom
jackye1995:arrow-dir

Conversation

@jackye1995
Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 commented Oct 22, 2025

I ended up doing these in #4984 and #4905 so I decided to pull it out and get it cleaned up first.

This PR moves the directory namespace from using OpenDAL directly to using Lance ObjectStore. This avoids the inconsistency between the dir namespace and the underlying lance table storage configurations. User can still use OpenDAL, and if we fully migrate Lance to OpenDAL it will be applied to both layers at the same time as well.

The PR also improves the builder of the namespaces with builder style and allow supplying a Lance session. Since we have not published a stable version yet, we do not care about backwards compatibility.

This PR also ensures the lance-namespace-impls features are consistent with lance-io features. Related to #5042

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 500 to 504
};

// Create the Lance dataset using the actual Lance API
Dataset::write(reader, &table_path, Some(write_params))
Dataset::write(reader, &table_uri, Some(write_params))
.await
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve storage options when writing datasets

The refactor builds an ObjectStore with the provided storage.* properties but create_table now calls Dataset::write(reader, &table_uri, Some(write_params)) without forwarding those options (write_params is left at the default other than mode). As a result, the storage options collected in DirectoryNamespaceConfig are ignored during dataset creation. For remote backends that depend on those options (custom S3 endpoints, credentials, regions, etc.), Dataset::write will initialize a new object store without the necessary configuration and the table creation will fail. Consider populating WriteParams::store_params (or otherwise reusing the initialized object store) so writes inherit the configured storage options.

Useful? React with 👍 / 👎.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 22, 2025

Codecov Report

❌ Patch coverage is 76.94805% with 71 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.70%. Comparing base (633aaa5) to head (056067a).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance-namespace-impls/src/rest.rs 59.22% 42 Missing ⚠️
rust/lance-namespace-impls/src/dir.rs 79.67% 24 Missing and 1 partial ⚠️
rust/lance-namespace-impls/src/connect.rs 95.12% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5045      +/-   ##
==========================================
- Coverage   81.74%   81.70%   -0.04%     
==========================================
  Files         340      340              
  Lines      137505   138554    +1049     
  Branches   137505   138554    +1049     
==========================================
+ Hits       112397   113208     +811     
- Misses      21370    21618     +248     
+ Partials     3738     3728      -10     
Flag Coverage Δ
unittests 81.70% <76.94%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jackye1995 jackye1995 changed the title refactor: use lance-io object store for dir namespace refactor: use lance-io object store for dir namespace and improve builder Oct 23, 2025
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM while all CI passing. But I think the aws-* crates is not introduced by opendal.

@jackye1995
Copy link
Copy Markdown
Contributor Author

Oh yeah that's a separated issue, not introduced by Opendal, let me clarify in the description

@jackye1995
Copy link
Copy Markdown
Contributor Author

Known flaky test #4925

@jackye1995 jackye1995 merged commit c73c3dc into lance-format:main Oct 23, 2025
26 of 27 checks passed
jackye1995 added a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
…lder (lance-format#5045)

I ended up doing these in lance-format#4984 and
lance-format#4905 so I decided to pull it out
and get it cleaned up first.

This PR moves the directory namespace from using OpenDAL directly to
using Lance ObjectStore. This avoids the inconsistency between the dir
namespace and the underlying lance table storage configurations. User
can still use OpenDAL, and if we fully migrate Lance to OpenDAL it will
be applied to both layers at the same time as well.

The PR also improves the builder of the namespaces with builder style
and allow supplying a Lance session. Since we have not published a
stable version yet, we do not care about backwards compatibility.

This PR also ensures the lance-namespace-impls features are consistent
with lance-io features. Related to
lance-format#5042
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants