Skip to content

fix: prevent cache collisions in ObjectStoreRegistry#5153

Merged
cmccabe merged 15 commits intomainfrom
colin_object_store_prefix
Nov 7, 2025
Merged

fix: prevent cache collisions in ObjectStoreRegistry#5153
cmccabe merged 15 commits intomainfrom
colin_object_store_prefix

Conversation

@cmccabe
Copy link
Copy Markdown
Contributor

@cmccabe cmccabe commented Nov 5, 2025

ObjectStoreRegistry maintains a cache of ObjectStore objects. Previously, when caching an Azure blob store, the account name was not part of the cache key. This could result in collisions in the case where we tried to cache two Azure blob stores using the same container name, but different account names. Container names are not unique in Azure; only the combination of account name, container name is.

The flow here is roughly that we determine what type of object store we're creating by looking at the first part of the URL (the scheme), and then use the associated object store Provider object to generate the appropriate store_prefix from the second part of the URL (the authority). We want to be able to do all this relatively quickly, without instantiating heavyweight objects, to support the cache key use-case.

In the specific case of Azure, the account name may not even be in the URL at all. It can be passed as part of the storage options. Therefore, the function generating the object store prefix must also take the storage_options as a parameter. The final object store prefix we generate for Azure must include this account information, as well as the container information.

This PR modifies WrappingObjectStore.wrap() to take the store prefix as an argument, rather than the storage options. In general, the storage options are highly specific to the type of the object store, which the wrapper generally doesn't know. It would not make sense to look for Azure-specific storage options when wrapping an s3 store, for example.

ObjectStoreRegistry maintains a cache of ObjectStore objects. Previously, when caching an Azure
blob store, the account name was not part of the cache key. This could result in collisions in the
case where we tried to cache two Azure blob stores using the same container name, but different
account names. Container names are not unique in Azure; only the combination of account name,
container name is.

The flow here is roughly that we determine what type of object store we're creating by looking at
the first part of the URL (the scheme), and then use the associated object store Provider object to
generate the appropriate store_prefix from the second part of the URL (the authority). We want to
be able to do all this relatively quickly, without instantiating heavyweight objects, to support
the cache key use-case.

In the specific case of Azure, the account name may not even be in the URL at all. It can be passed
as part of the storage options. Therefore, the function generating the object store prefix must
also take the storage_options as a parameter. The final object store prefix we generate for Azure
must include this account information, as well as the container information.

This PR modifies WrappingObjectStore.wrap() to take the store prefix as an argument, rather than
the storage options. In general, the storage options are highly specific to the type of the object
store, which the wrapper generally doesn't know. It would not make sense to look for Azure-specific
storage options when wrapping an s3 store, for example.
@github-actions github-actions Bot added the bug Something isn't working label Nov 5, 2025
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread rust/lance-io/src/object_store/providers.rs Outdated
Comment thread rust/lance-io/src/object_store/providers/azure.rs
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.51786% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.85%. Comparing base (35a0547) to head (27ddccb).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance-io/src/object_store.rs 27.27% 8 Missing ⚠️
rust/lance-io/src/object_store/providers/azure.rs 93.40% 5 Missing and 1 partial ⚠️
rust/lance-io/src/object_store/providers.rs 93.90% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #5153    +/-   ##
========================================
  Coverage   81.85%   81.85%            
========================================
  Files         341      341            
  Lines      140548   140854   +306     
  Branches   140548   140854   +306     
========================================
+ Hits       115046   115298   +252     
- Misses      21694    21743    +49     
- Partials     3808     3813     +5     
Flag Coverage Δ
unittests 81.85% <91.51%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I like the idea of Pushing the responsibility of creating a unique cache key on the provider rather than exposing the storage options to the wrapper store 👍

@github-actions github-actions Bot added the java label Nov 7, 2025
@cmccabe cmccabe merged commit e34e83b into main Nov 7, 2025
30 of 31 checks passed
@cmccabe cmccabe deleted the colin_object_store_prefix branch November 7, 2025 21:11
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
ObjectStoreRegistry maintains a cache of ObjectStore objects.
Previously, when caching an Azure blob store, the account name was not
part of the cache key. This could result in collisions in the case where
we tried to cache two Azure blob stores using the same container name,
but different account names. Container names are not unique in Azure;
only the combination of account name, container name is.

The flow here is roughly that we determine what type of object store
we're creating by looking at the first part of the URL (the scheme), and
then use the associated object store Provider object to generate the
appropriate store_prefix from the second part of the URL (the
authority). We want to be able to do all this relatively quickly,
without instantiating heavyweight objects, to support the cache key
use-case.

In the specific case of Azure, the account name may not even be in the
URL at all. It can be passed as part of the storage options. Therefore,
the function generating the object store prefix must also take the
storage_options as a parameter. The final object store prefix we
generate for Azure must include this account information, as well as the
container information.

This PR modifies WrappingObjectStore.wrap() to take the store prefix as
an argument, rather than the storage options. In general, the storage
options are highly specific to the type of the object store, which the
wrapper generally doesn't know. It would not make sense to look for
Azure-specific storage options when wrapping an s3 store, for example.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working java

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants