Skip to content

feat: support dynamic storage options provider with AWS credentials vending#4905

Merged
jackye1995 merged 4 commits intolance-format:mainfrom
jackye1995:reimport-lns
Oct 31, 2025
Merged

feat: support dynamic storage options provider with AWS credentials vending#4905
jackye1995 merged 4 commits intolance-format:mainfrom
jackye1995:reimport-lns

Conversation

@jackye1995
Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 commented Oct 7, 2025

This PR introduces a new dynamic storage options provider interface in Lance dataset. The main idea is that the provider describes the storage options to use for a given dataset, and when these options will expire. Lance is responsible for fetching another new set of storage options to re-initialize the object store when expiration happens.

This is mainly useful for cases where the dataset's access credentials are temporary, and the user would like to invoke a specific credentials endpoint to fetch a new set of credentials. Currently we have only added support for AWS by implementing a credentials provider.

The PR also provides an implementation of the StorageOptionsProvider with Lance Namespace, because Lance Namespace provides a DescribeTable endpoint which returns the storage options that should be used at the given time. Based on the namespace spec, a expires_at_millis key can be added to the storage options to indicate the expiration time of those options.

Because Lance Namespace provides native implementations in python and Java, we also provides binding interfaces of of PyStorageOptionsProvider and JavaStorageOptionsProvider, which then further integrated with Lance Namespace implemented in those specific languages.

@jackye1995 jackye1995 marked this pull request as draft October 7, 2025 06:50
@github-actions github-actions Bot added the enhancement New feature or request label Oct 7, 2025
@jackye1995 jackye1995 changed the title feat: import lance-namespace into lance feat: import lance-namespace into lance and support credentials vending Oct 7, 2025
@github-actions github-actions Bot added the python label Oct 7, 2025
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 7, 2025

Codecov Report

❌ Patch coverage is 61.46045% with 190 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.73%. Comparing base (9ed9ee2) to head (7855872).

Files with missing lines Patch % Lines
rust/lance-io/src/object_store/providers/aws.rs 78.08% 76 Missing and 4 partials ⚠️
rust/lance/src/dataset/builder.rs 4.68% 59 Missing and 2 partials ⚠️
rust/lance-io/src/object_store/storage_options.rs 0.00% 25 Missing ⚠️
rust/lance-namespace-impls/src/rest.rs 0.00% 10 Missing ⚠️
rust/lance-namespace-impls/src/dir.rs 0.00% 9 Missing ⚠️
rust/lance-io/src/object_store.rs 70.58% 1 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4905      +/-   ##
==========================================
- Coverage   81.77%   81.73%   -0.05%     
==========================================
  Files         340      341       +1     
  Lines      140102   140593     +491     
  Branches   140102   140593     +491     
==========================================
+ Hits       114568   114907     +339     
- Misses      21729    21867     +138     
- Partials     3805     3819      +14     
Flag Coverage Δ
unittests 81.73% <61.46%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions Bot added the java label Oct 8, 2025
@jackye1995 jackye1995 force-pushed the reimport-lns branch 2 times, most recently from 4fedbd1 to 0169ede Compare October 10, 2025 20:40
@jackye1995 jackye1995 marked this pull request as ready for review October 14, 2025 17:31
@wjones127 wjones127 self-assigned this Oct 14, 2025
Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got stuck trying to understand the basics of how this works.

Comment on lines +42 to +43
/// and expires_at_millis is the epoch time in milliseconds when credentials expire
async fn get_credentials(&self) -> Result<(HashMap<String, String>, u64)>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a proper time type here, like std::time::Instant?

Suggested change
/// and expires_at_millis is the epoch time in milliseconds when credentials expire
async fn get_credentials(&self) -> Result<(HashMap<String, String>, u64)>;
/// and expires_at_millis is the epoch time in milliseconds when credentials expire
async fn get_credentials(&self) -> Result<(HashMap<String, String>, Instant)>;

Comment on lines +112 to +114
/// How early to refresh credentials before expiration (in milliseconds)
/// Default: 300,000 (5 minutes)
pub refresh_lead_time_ms: u64,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, can we use a proper type here like std::time::Duration?

Suggested change
/// How early to refresh credentials before expiration (in milliseconds)
/// Default: 300,000 (5 minutes)
pub refresh_lead_time_ms: u64,
/// How early to refresh credentials before expiration (in milliseconds)
/// Default: 300,000 (5 minutes)
pub refresh_lead_time_ms: Duration,

Comment on lines +306 to +309
pub struct DelegatingObjectStore {
wrapper: Arc<CredentialVendingObjectStoreWrapper>,
inner: Arc<dyn OSObjectStore>,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are the credentials passed to the object store?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been looking through this code, and I haven't been able to figure out how the credentials are passed to the actual object store instance? I see them being refreshed to the side, but I don't see how the object store is supposed to pick up the new credentials.

What I would expect is somewhere you build a CredentialProvider and then you pass it into ObjectStore::with_credentials() when constructing the object store. For example, that's how we have credential refresh for AWS working right now:

https://github.com/lancedb/lance/blob/d342d5db2da90edcbcf4bd88aa38985bcb111aa8/rust/lance-io/src/object_store/providers/aws.rs#L58-L64
https://github.com/lancedb/lance/blob/d342d5db2da90edcbcf4bd88aa38985bcb111aa8/rust/lance-io/src/object_store/providers/aws.rs#L80-L82

@jackye1995 jackye1995 force-pushed the reimport-lns branch 3 times, most recently from d254ba9 to 4443b04 Compare October 17, 2025 05:01
@jackye1995 jackye1995 changed the title feat: import lance-namespace into lance and support credentials vending feat: support credentials vending Oct 17, 2025
@jackye1995 jackye1995 marked this pull request as draft October 17, 2025 07:09
@jackye1995
Copy link
Copy Markdown
Contributor Author

Actually it might be better to call this feature more generically, like StorageOptionProvider. Let me do another round of refactoring.

@jackye1995 jackye1995 changed the title feat: support credentials vending feat: support dynamic storage options provider Oct 21, 2025
@jackye1995 jackye1995 force-pushed the reimport-lns branch 4 times, most recently from 1a26702 to 28e26e8 Compare October 22, 2025 14:41
jackye1995 added a commit that referenced this pull request Oct 23, 2025
…lder (#5045)

I ended up doing these in #4984 and
#4905 so I decided to pull it out
and get it cleaned up first.

This PR moves the directory namespace from using OpenDAL directly to
using Lance ObjectStore. This avoids the inconsistency between the dir
namespace and the underlying lance table storage configurations. User
can still use OpenDAL, and if we fully migrate Lance to OpenDAL it will
be applied to both layers at the same time as well.

The PR also improves the builder of the namespaces with builder style
and allow supplying a Lance session. Since we have not published a
stable version yet, we do not care about backwards compatibility.

This PR also ensures the lance-namespace-impls features are consistent
with lance-io features. Related to
#5042
@jackye1995 jackye1995 force-pushed the reimport-lns branch 5 times, most recently from 7ab0258 to c3ffec3 Compare October 23, 2025 18:50
@jackye1995 jackye1995 force-pushed the reimport-lns branch 2 times, most recently from d7631ae to fda3dda Compare October 23, 2025 21:29
@jackye1995 jackye1995 changed the title feat: support dynamic storage options provider feat: support dynamic storage options provider with AWS credentials vending Oct 25, 2025
@jackye1995 jackye1995 marked this pull request as ready for review October 25, 2025 02:03
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

https://github.com/lancedb/lance/blob/e1f2a2cea032d1e2412f0ba5e32400d47d5dd633/rust/lance-io/src/object_store.rs#L224-L279
P1 Badge Exclude storage_options_provider from ObjectStoreParams cache key

The new dynamic credential provider is carried on ObjectStoreParams.storage_options_provider, but the Hash/PartialEq implementations used by the object‑store registry still ignore this field. Two datasets that point to the same bucket but use different StorageOptionsProvider implementations will hash to the same key and be treated as equal, so the registry can return a cached ObjectStore initialized with the wrong provider. This means credential refreshes (and even initial credentials) for dataset A can be reused for dataset B, leading to authentication failures or credential leakage across tables. The provider pointer should be included in Hash/eq to ensure object store caching respects different credential sources.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@jackye1995
Copy link
Copy Markdown
Contributor Author

The new dynamic credential provider is carried on ObjectStoreParams.storage_options_provider, but the Hash/PartialEq implementations used by the object‑store registry still ignore this field.

Good point. Added the rust one locally, pending merging python and java one.

jackye1995 added a commit to lance-format/lance-namespace that referenced this pull request Oct 27, 2025
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific use case for the dynamic storage options beyond credentials? If not, I think I'd prefer to make this just a credential provider.

Comment on lines +23 to +34
/// Trait for providing storage options with expiration tracking
///
/// Implementations can fetch storage options from various sources (namespace servers,
/// secret managers, etc.) and are usable from Python/Java.
///
/// # Equality and Hashing
///
/// Implementations must provide `provider_id()` which returns a unique identifier for
/// equality and hashing purposes. Two providers with the same ID are considered equal
/// and will share the same cached ObjectStore in the registry.
#[async_trait]
pub trait StorageOptionsProvider: Send + Sync + fmt::Debug {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to make this generic storage options? If not, I think we should narrow it to just be credentials.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In near term, might not be.

If I think from what was possible in similar features in Iceberg, there are things like

  • bucket alias, that user can say "for this bucket name, swap to use another one temporarily"
  • s3 tags, user can configure some dynamic tags to be assigned to the files written by specific sessions

That could be enabled.

Basically everything that is controllable in the Iceberg FileIO can be controlled dynamically in such way, so I was thinking about making equivalent changes in Lance for ObjectStore.

But I don't have a strong opinion, it's probably fine to also just do the features case by case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I'm more open to this. The tags idea seems cool. I also could see a future where you might want the catalog to tell you which KMS keys to use for an S3 bucket. 👍

Comment on lines +323 to +329
// Case 4: provider + credentials without expiration - FAIL
(None, Some(_), _) => Err(Error::IO {
source: Box::new(std::io::Error::other(
"expires_at_millis is required when using storage_options_provider with credentials",
)),
location: location!(),
}),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can trust if they pass a credential provider they already handled expiration, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is more for completeness

Comment on lines +494 to +498
pub struct DynamicStorageOptionsCredentialProvider {
provider: Arc<dyn StorageOptionsProvider>,
cache: Arc<RwLock<Option<CachedCredential>>>,
refresh_offset: Duration,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I wanted to be fancy, I would do something like:

pub struct DynamicCredentials(pub Arc<HashMap<String, String>>);

pub struct NamespaceCredentialsProvider {
    provider: Arc<dyn StorageOptionsProvider>,
    cache: Arc<RwLock<Option<CachedCredential>>>,
    refresh_offset: Duration,
}

impl CredentialProvider for NamespaceCredentialsProvider<AwsCredentials> {
    type Credential = AwsCredentials;

    async fn get_credential(&self) -> ObjectStoreResult<Arc<Self::Credential>> {
        self.fetch_credential::<Self::Credential>().await
    }
}

impl TryFrom<DynamicCredentials> for AwsCredentials {}
impl TryFrom<DynamicCredentials> for AzureCredentials {}
impl TryFrom<DynamicCredentials> for GcpCredentials {}

impl<T: TryFrom<DynamicCredentials>> NamespaceCredentialsProvider {
    async fn fetch_credential<T>(&self) -> Result<T> {
        let credential = todo!();
         credential.try_into()
    }
}

Then you don't need to repeat this adapter for all three clouds. The only thing that's cloud specific is how to map from DynamicCredentials to the particular credentials type.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I actually originally implemented it that way. But I tried on GCP and it did not really work out, because GCP does not really provide remote temporary credentials and the generation of temporary token and signing is done all locally. I need to look more into it so I ended up just do it for AWS for now. Let me know if you prefer to first set it up this way.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. That's fine if it's hard to do. Can also be a future refactor.

ObjectStore has this TokenProvider internal trait they use. Not sure if you saw that already:

https://github.com/apache/arrow-rs-object-store/blob/b72c00fb6d9ec30355fea7085556e4df41646ff5/src/client/mod.rs#L847-L895

Comment on lines +240 to +241
storage_options_provider: Option<Arc<dyn StorageOptionsProvider>>,
expires_at_millis: Option<u64>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It seems like storage_options_provider and credentials should be mutually exclusive? Either I want the credentials to be controlled by storage_options_provider or I want to pass my own customer AwsCredentialProvider.
  2. I'm not sure why we need expires_at_millis. My understanding is that the final AwsCredentialProvider::get_credentials() is called on every request, so the expiration could be handled internally, right?

Copy link
Copy Markdown
Contributor Author

@jackye1995 jackye1995 Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually intentionally not mutually exclusive. What I am trying to achieve here is that this is the initial value. Because initially when we try to load a dataset, we already want to call namespace.describeTable once, which gives the table location + storage options that contains the initial credentials and expiration time. I am basically leveraging the existing fields to pass those in, so that I don't need to do another duplicated namespace.describeTable call in order to just fetch another new set of credential and expiration time. Once the initial usage is done, these fields are not respected.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. Could you explain that in a comment?

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your responses. I think this is good, but I'd suggest adding some comments the explain the stuff I had questions about.

Copy link
Copy Markdown
Contributor

@beinan beinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're looking forward to this feature, thanks!

@jackye1995 jackye1995 merged commit 95143fd into lance-format:main Oct 31, 2025
26 of 27 checks passed
jackye1995 added a commit that referenced this pull request Nov 16, 2025
This PR extends upon #4905 for 2 features:
1. when writing, the dataset might not exist yet, we add convenient
method `write_into_namespace` which calls `namespace.create_empty_table`
to create the empty table, get its URI and storage options to use (if
any), and use that to create the table.
2. pass in storage options provider to:
    1. write_fragments for distributed write
2. Lance file writer for callers that write individual lance files in
the same table directory but not in the lance table

note: this PR only offers rust and python implementation, java will be
added later
jackye1995 added a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
…lder (lance-format#5045)

I ended up doing these in lance-format#4984 and
lance-format#4905 so I decided to pull it out
and get it cleaned up first.

This PR moves the directory namespace from using OpenDAL directly to
using Lance ObjectStore. This avoids the inconsistency between the dir
namespace and the underlying lance table storage configurations. User
can still use OpenDAL, and if we fully migrate Lance to OpenDAL it will
be applied to both layers at the same time as well.

The PR also improves the builder of the namespaces with builder style
and allow supplying a Lance session. Since we have not published a
stable version yet, we do not care about backwards compatibility.

This PR also ensures the lance-namespace-impls features are consistent
with lance-io features. Related to
lance-format#5042
jackye1995 added a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
…ending (lance-format#4905)

This PR introduces a new dynamic storage options provider interface in
Lance dataset. The main idea is that the provider describes the storage
options to use for a given dataset, and when these options will expire.
Lance is responsible for fetching another new set of storage options to
re-initialize the object store when expiration happens.

This is mainly useful for cases where the dataset's access credentials
are temporary, and the user would like to invoke a specific credentials
endpoint to fetch a new set of credentials. Currently we have only added
support for AWS by implementing a credentials provider.

The PR also provides an implementation of the `StorageOptionsProvider`
with Lance Namespace, because Lance Namespace provides a DescribeTable
endpoint which returns the storage options that should be used at the
given time. Based on the namespace spec, a `expires_at_millis` key can
be added to the storage options to indicate the expiration time of those
options.

Because Lance Namespace provides native implementations in python and
Java, we also provides binding interfaces of of
`PyStorageOptionsProvider` and `JavaStorageOptionsProvider`, which then
further integrated with Lance Namespace implemented in those specific
languages.
jackye1995 added a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
This PR extends upon lance-format#4905 for 2 features:
1. when writing, the dataset might not exist yet, we add convenient
method `write_into_namespace` which calls `namespace.create_empty_table`
to create the empty table, get its URI and storage options to use (if
any), and use that to create the table.
2. pass in storage options provider to:
    1. write_fragments for distributed write
2. Lance file writer for callers that write individual lance files in
the same table directory but not in the lance table

note: this PR only offers rust and python implementation, java will be
added later
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants