fix: prevent cache collisions in ObjectStoreRegistry by cmccabe · Pull Request #5153 · lance-format/lance

cmccabe · 2025-11-05T23:41:37Z

ObjectStoreRegistry maintains a cache of ObjectStore objects. Previously, when caching an Azure blob store, the account name was not part of the cache key. This could result in collisions in the case where we tried to cache two Azure blob stores using the same container name, but different account names. Container names are not unique in Azure; only the combination of account name, container name is.

The flow here is roughly that we determine what type of object store we're creating by looking at the first part of the URL (the scheme), and then use the associated object store Provider object to generate the appropriate store_prefix from the second part of the URL (the authority). We want to be able to do all this relatively quickly, without instantiating heavyweight objects, to support the cache key use-case.

In the specific case of Azure, the account name may not even be in the URL at all. It can be passed as part of the storage options. Therefore, the function generating the object store prefix must also take the storage_options as a parameter. The final object store prefix we generate for Azure must include this account information, as well as the container information.

This PR modifies WrappingObjectStore.wrap() to take the store prefix as an argument, rather than the storage options. In general, the storage options are highly specific to the type of the object store, which the wrapper generally doesn't know. It would not make sense to look for Azure-specific storage options when wrapping an s3 store, for example.

ObjectStoreRegistry maintains a cache of ObjectStore objects. Previously, when caching an Azure blob store, the account name was not part of the cache key. This could result in collisions in the case where we tried to cache two Azure blob stores using the same container name, but different account names. Container names are not unique in Azure; only the combination of account name, container name is. The flow here is roughly that we determine what type of object store we're creating by looking at the first part of the URL (the scheme), and then use the associated object store Provider object to generate the appropriate store_prefix from the second part of the URL (the authority). We want to be able to do all this relatively quickly, without instantiating heavyweight objects, to support the cache key use-case. In the specific case of Azure, the account name may not even be in the URL at all. It can be passed as part of the storage options. Therefore, the function generating the object store prefix must also take the storage_options as a parameter. The final object store prefix we generate for Azure must include this account information, as well as the container information. This PR modifies WrappingObjectStore.wrap() to take the store prefix as an argument, rather than the storage options. In general, the storage options are highly specific to the type of the object store, which the wrapper generally doesn't know. It would not make sense to look for Azure-specific storage options when wrapping an s3 store, for example.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

codecov-commenter · 2025-11-06T20:17:51Z

Codecov Report

❌ Patch coverage is 91.51786% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.85%. Comparing base (35a0547) to head (27ddccb).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
rust/lance-io/src/object_store.rs	27.27%	8 Missing ⚠️
rust/lance-io/src/object_store/providers/azure.rs	93.40%	5 Missing and 1 partial ⚠️
rust/lance-io/src/object_store/providers.rs	93.90%	3 Missing and 2 partials ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #5153    +/-   ##
========================================
  Coverage   81.85%   81.85%            
========================================
  Files         341      341            
  Lines      140548   140854   +306     
  Branches   140548   140854   +306     
========================================
+ Hits       115046   115298   +252     
- Misses      21694    21743    +49     
- Partials     3808     3813     +5

Flag	Coverage Δ
unittests	`81.85% <91.51%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

wjones127

This looks good. I like the idea of Pushing the responsibility of creating a unique cache key on the provider rather than exposing the storage options to the wrapper store 👍

ObjectStoreRegistry maintains a cache of ObjectStore objects. Previously, when caching an Azure blob store, the account name was not part of the cache key. This could result in collisions in the case where we tried to cache two Azure blob stores using the same container name, but different account names. Container names are not unique in Azure; only the combination of account name, container name is. The flow here is roughly that we determine what type of object store we're creating by looking at the first part of the URL (the scheme), and then use the associated object store Provider object to generate the appropriate store_prefix from the second part of the URL (the authority). We want to be able to do all this relatively quickly, without instantiating heavyweight objects, to support the cache key use-case. In the specific case of Azure, the account name may not even be in the URL at all. It can be passed as part of the storage options. Therefore, the function generating the object store prefix must also take the storage_options as a parameter. The final object store prefix we generate for Azure must include this account information, as well as the container information. This PR modifies WrappingObjectStore.wrap() to take the store prefix as an argument, rather than the storage options. In general, the storage options are highly specific to the type of the object store, which the wrapper generally doesn't know. It would not make sense to look for Azure-specific storage options when wrapping an s3 store, for example.

github-actions Bot added the bug Something isn't working label Nov 5, 2025

chatgpt-codex-connector Bot reviewed Nov 5, 2025

View reviewed changes

Comment thread rust/lance-io/src/object_store/providers.rs Outdated

Comment thread rust/lance-io/src/object_store/providers/azure.rs

cmccabe added 5 commits November 5, 2025 21:54

fixes etc

6bb00bb

more fixes

09e1429

providers.rs: some tests and fixes

eea1c17

fmt etc

2572bf0

azure.rs: add some tests

27ddccb

cmccabe added 7 commits November 6, 2025 12:39

Merge branch 'main' into colin_object_store_prefix

dafb876

format

51bcefb

fix typo

fadea9f

fix clippy

ba5bc87

fix fmt

1de9fc9

Merge branch 'main' into colin_object_store_prefix

5f77200

change

488e50f

wjones127 approved these changes Nov 7, 2025

View reviewed changes

fix java exception catching

f6a36f6

github-actions Bot added the java label Nov 7, 2025

Merge branch 'main' into colin_object_store_prefix

3abc812

cmccabe merged commit e34e83b into main Nov 7, 2025
30 of 31 checks passed

cmccabe deleted the colin_object_store_prefix branch November 7, 2025 21:11

andrea-reale mentioned this pull request Mar 30, 2026

emilk/fix write starvation rerun-io/lance#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent cache collisions in ObjectStoreRegistry#5153

fix: prevent cache collisions in ObjectStoreRegistry#5153
cmccabe merged 15 commits intomainfrom
colin_object_store_prefix

cmccabe commented Nov 5, 2025

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Nov 6, 2025

Uh oh!

wjones127 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cmccabe commented Nov 5, 2025

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Nov 6, 2025

Codecov Report

Uh oh!

wjones127 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants