feat(storage): implement opendal resolving storage#2231
feat(storage): implement opendal resolving storage#2231blackmwk merged 9 commits intoapache:mainfrom
Conversation
| props: HashMap<String, String>, | ||
| /// Cache of scheme → storage mappings. | ||
| #[serde(skip, default)] | ||
| storages: RwLock<HashMap<String, Arc<OpenDalStorage>>>, |
There was a problem hiding this comment.
This should be multi map? For example, we may need to support both s3 and s3a for S3 storage.
There was a problem hiding this comment.
The key here is a string representing a scheme, you can have both within a map:
("s3", OpenDalS3Storage),
("s3a", AnotherOpenDalS3Storage)
Or we are thinking of mapping one scheme to multiple storages?
There was a problem hiding this comment.
I was thinking we should map them into same storage? A storage instance has a lot of resources inside, like connection pool, etc.
There was a problem hiding this comment.
Currently there is a configured_scheme for OpenDalStorage::{S3 , Azdls}, the path it handles should match the configured scheme, so technically it shouldn't be using the same storage instance if the schemes are different.
https://github.com/apache/iceberg-rust/blob/main/crates/storage/opendal/src/lib.rs#L110
I'm not quite sure about the reason tho, maybe it's an OpenDal limitation? I think we can improve this in a different PR if needed
There was a problem hiding this comment.
IIRC the configured_scheme was a legacy setting from before we refactor Storage trait. I think we no longer need this field since the Stroage now accepts the full url. Please create an issue to track it.
| props: HashMap<String, String>, | ||
| /// Cache of scheme → storage mappings. | ||
| #[serde(skip, default)] | ||
| storages: RwLock<HashMap<String, Arc<OpenDalStorage>>>, |
There was a problem hiding this comment.
I was thinking we should map them into same storage? A storage instance has a lot of resources inside, like connection pool, etc.
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#2210 ## What changes are included in this PR? - Add OpenDalResolvingStorage <!-- Provide a summary of the modifications in this PR. List the main changes such as new features, bug fixes, refactoring, or any other updates. --> ## Are these changes tested? Added a new test <!-- Specify what test covers (unit test, integration test, etc.). If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? -->
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #2210 ## What changes are included in this PR? - Add OpenDalResolvingStorage <!-- Provide a summary of the modifications in this PR. List the main changes such as new features, bug fixes, refactoring, or any other updates. --> ## Are these changes tested? Added a new test <!-- Specify what test covers (unit test, integration test, etc.). If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> (cherry picked from commit ffd6454)
…e impls (#2338) ## Which issue does this PR close? - Closes #2245 - Related #2231 ## What changes are included in this PR? - Remove configured_scheme field from OpenDalStorage::{S3,Azdls} - Make S3 storage use the scheme in the file paths, allowing for custom S3-compatible schemes like minio:// - Use `HashMap<Scheme, Arc<OpenDalStorage>>` so aliases share a storage instance - Added new unit tests and removed some now-obsolete scheme-mismatch tests ## Break Change We are removing a struct field from public types, so this would need to be release in 0.10.0 ```rust // Before OpenDalStorageFactory::S3 { configured_scheme: "s3a".to_string(), customized_credential_load: None, } OpenDalStorageFactory::Azdls { configured_scheme: AzureStorageScheme::Abfss, } // After OpenDalStorageFactory::S3 { customized_credential_load: None } OpenDalStorageFactory::Azdls ``` ## Are these changes tested? Beyond the unit tests, I ran these integration tests. ```sh docker compose -f dev/docker-compose.yaml up -d --wait # requires unset on any AWS_ env vars cargo test -p iceberg-integration-tests cargo test -p iceberg-catalog-hms --test hms_catalog_test cargo test -p iceberg-catalog-loader cargo test -p iceberg-storage-opendal --features opendal-s3 --test file_io_s3_test ```
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#2210 ## What changes are included in this PR? - Add OpenDalResolvingStorage <!-- Provide a summary of the modifications in this PR. List the main changes such as new features, bug fixes, refactoring, or any other updates. --> ## Are these changes tested? Added a new test <!-- Specify what test covers (unit test, integration test, etc.). If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? -->
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#2210 ## What changes are included in this PR? - Add OpenDalResolvingStorage <!-- Provide a summary of the modifications in this PR. List the main changes such as new features, bug fixes, refactoring, or any other updates. --> ## Are these changes tested? Added a new test <!-- Specify what test covers (unit test, integration test, etc.). If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? -->
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#2210 ## What changes are included in this PR? - Add OpenDalResolvingStorage <!-- Provide a summary of the modifications in this PR. List the main changes such as new features, bug fixes, refactoring, or any other updates. --> ## Are these changes tested? Added a new test <!-- Specify what test covers (unit test, integration test, etc.). If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? -->
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#2210 ## What changes are included in this PR? - Add OpenDalResolvingStorage <!-- Provide a summary of the modifications in this PR. List the main changes such as new features, bug fixes, refactoring, or any other updates. --> ## Are these changes tested? Added a new test <!-- Specify what test covers (unit test, integration test, etc.). If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? -->
…e impls (apache#2338) ## Which issue does this PR close? - Closes apache#2245 - Related apache#2231 ## What changes are included in this PR? - Remove configured_scheme field from OpenDalStorage::{S3,Azdls} - Make S3 storage use the scheme in the file paths, allowing for custom S3-compatible schemes like minio:// - Use `HashMap<Scheme, Arc<OpenDalStorage>>` so aliases share a storage instance - Added new unit tests and removed some now-obsolete scheme-mismatch tests ## Break Change We are removing a struct field from public types, so this would need to be release in 0.10.0 ```rust // Before OpenDalStorageFactory::S3 { configured_scheme: "s3a".to_string(), customized_credential_load: None, } OpenDalStorageFactory::Azdls { configured_scheme: AzureStorageScheme::Abfss, } // After OpenDalStorageFactory::S3 { customized_credential_load: None } OpenDalStorageFactory::Azdls ``` ## Are these changes tested? Beyond the unit tests, I ran these integration tests. ```sh docker compose -f dev/docker-compose.yaml up -d --wait # requires unset on any AWS_ env vars cargo test -p iceberg-integration-tests cargo test -p iceberg-catalog-hms --test hms_catalog_test cargo test -p iceberg-catalog-loader cargo test -p iceberg-storage-opendal --features opendal-s3 --test file_io_s3_test ``` (cherry picked from commit fda82a2)
Which issue does this PR close?
What changes are included in this PR?
Are these changes tested?
Added a new test