feat(flo-cloud): add Azure Blob Storage provider#249
Conversation
- Adds AzureBlobStorage as a third cloud storage backend alongside AWS S3 and GCP GCS. - Implements the full CloudStorageHandler interface (get, upload, delete, list, presigned SAS URLs, text writer). - Supports both Service Principal (AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID) and DefaultAzureCredential authentication. - Requires AZURE_STORAGE_ACCOUNT_URL to point to the storage account endpoint. - Also adds azure-storage-blob and azure-identity dependencies.
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds Azure Blob Storage support: new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant App as AzureBlobStorage
participant AD as Azure AD (credential)
participant Blob as Blob Service
Client->>App: request presigned URL (bucket,key,method,expiresIn)
App->>AD: request user delegation key (start,expiry)
AD-->>App: user delegation key
App->>Blob: build SAS using delegation key and permissions
Blob-->>App: SAS token / confirmation
App-->>Client: full SAS URL
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py (2)
194-210: Add return type annotation.The method returns a tuple but lacks a type annotation. This should be
Tuple[str, str]for consistency with the documented return value.📝 Suggested fix
- def get_bucket_key(self, value: str): + def get_bucket_key(self, value: str) -> Tuple[str, str]:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py` around lines 194 - 210, get_bucket_key currently lacks a return type annotation; update the signature of get_bucket_key to declare it returns Tuple[str, str] and, if Tuple isn't already imported, add it from typing so the function signature reads with the correct type hint; keep the function body unchanged and ensure any linters/types checkers pick up the new annotation.
212-213: Parametertypeshadows Python builtin.Static analysis flags
typeas shadowing the builtin. While this matches the interface signature inCloudStorageHandler, consider renaming tooperationormethodin a future refactor across all implementations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py` around lines 212 - 213, The parameter name `type` in generate_presigned_url shadows the Python builtin; update the signature in generate_presigned_url to use a non-shadowing name like `operation` (or `method`) and update all internal references within this function (and any unit tests or callers in this module) to use the new name; also leave a TODO or comment noting that the CloudStorageHandler interface should be updated in a follow-up so implementations remain consistent across providers.wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py (2)
86-92: Azure type conversion duplicates GCP logic.Lines 86-92 are identical to lines 79-85 (GCP). Consider extracting a shared helper for providers that use
GET/PUT/POSTconventions, but this duplication is acceptable for now given the clarity it provides.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py` around lines 86 - 92, The Azure branch duplicates the same GET/PUT/POST mapping logic used for GCP; extract the shared mapping into a helper (e.g. _map_simple_http_method(type) or map_simple_provider_method) and call it from both the CloudProvider.GCP and CloudProvider.AZURE branches to remove duplication; the helper should return 'GET' for types 'get'/'get_object', 'PUT' for 'put'/'put_object', and 'POST' for 'post'/'post_object', and then replace the existing conditional blocks in the code that reference CloudProvider.GCP and CloudProvider.AZURE to call this helper.
35-36: Inconsistent credential handling across providers.
AzureBlobStoragereceives**credentialswhileS3Storage()andGCSStorage()are instantiated without any. If the other providers also support credential injection (e.g., for explicit key-based auth), consider passing credentials consistently. Otherwise, this asymmetry is acceptable since Azure requires explicit account URL configuration.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py` around lines 35 - 36, The branch handling CloudProvider.AZURE is inconsistent because AzureBlobStorage is constructed with **credentials while S3Storage() and GCSStorage() are created without; either pass credentials to all providers or stop forwarding them to Azure. Inspect the AzureBlobStorage constructor signature and, if other providers accept explicit credentials, change the S3Storage() and GCSStorage() instantiations to S3Storage(**credentials) and GCSStorage(**credentials); otherwise remove **credentials from AzureBlobStorage and build its required account_url/connection parameters explicitly (or map the credentials into the specific Azure args) so credential handling is consistent across CloudProvider.AZURE, CloudProvider.S3 and CloudProvider.GCS.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py`:
- Around line 288-300: The current list_files implementation uses islice over
blobs_iterator which iterates through all preceding items; change it to use
Azure server-side pagination: call
container_client.list_blobs(name_starts_with=prefix,
results_per_page=page_size).by_page() (use results_per_page and by_page on the
pages iterator), validate page_number and page_size >= 1, advance the pages
iterator page_number-1 times catching StopIteration to return empty results,
then materialize the current page into blob_names via [blob.name for blob in
current_page], and determine has_next by attempting next(pages) (catch
StopIteration to set False); update the logic in list_files to use these symbols
(list_files, container_client.list_blobs, results_per_page, by_page, pages)
instead of islice.
- Around line 62-69: The code currently silently falls back to
DefaultAzureCredential when only a subset of client credentials are provided;
update the credential selection logic so that if any of tenant_id, client_id, or
client_secret are provided but not all three, you raise a clear ValueError (or
appropriate exception) instead of using DefaultAzureCredential. In the
blob_storage credential creation block (where ClientSecretCredential and
DefaultAzureCredential are used and the variable credential is assigned),
validate that either all three values are truthy (then construct
ClientSecretCredential with tenant_id, client_id, client_secret) or none are
provided (then use DefaultAzureCredential); if a partial set is present, raise
an error mentioning the missing fields.
- Around line 112-115: Exception handlers in functions like get_file,
save_large_file, save_small_file, delete_file, generate_presigned_url, and
list_files currently re-raise new exceptions without chaining the original
exception; update each except block to use "raise <NewException>(...) from e"
(e.g., raise CloudStorageFileNotFoundError(bucket_name, file_path) from e or
raise Exception(f'...: {e}') from e) so the original traceback is preserved and
included in the new exception; locate the except clauses that catch
ResourceNotFoundError and general Exception in those functions and replace their
raise statements to include "from e".
In `@wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py`:
- Around line 152-159: The file_protocol method currently claims to return str
but returns None for unknown providers; change it to always return a str by
replacing the final "return None" path with raising a ValueError (include the
unknown self.provider in the message) so callers of file_protocol() never get
None; update the file_protocol signature if you prefer the alternative (->
Optional[str]) and handle callers accordingly, but the recommended fix is to
modify file_protocol to raise ValueError for unsupported CloudProvider values.
---
Nitpick comments:
In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py`:
- Around line 194-210: get_bucket_key currently lacks a return type annotation;
update the signature of get_bucket_key to declare it returns Tuple[str, str]
and, if Tuple isn't already imported, add it from typing so the function
signature reads with the correct type hint; keep the function body unchanged and
ensure any linters/types checkers pick up the new annotation.
- Around line 212-213: The parameter name `type` in generate_presigned_url
shadows the Python builtin; update the signature in generate_presigned_url to
use a non-shadowing name like `operation` (or `method`) and update all internal
references within this function (and any unit tests or callers in this module)
to use the new name; also leave a TODO or comment noting that the
CloudStorageHandler interface should be updated in a follow-up so
implementations remain consistent across providers.
In `@wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py`:
- Around line 86-92: The Azure branch duplicates the same GET/PUT/POST mapping
logic used for GCP; extract the shared mapping into a helper (e.g.
_map_simple_http_method(type) or map_simple_provider_method) and call it from
both the CloudProvider.GCP and CloudProvider.AZURE branches to remove
duplication; the helper should return 'GET' for types 'get'/'get_object', 'PUT'
for 'put'/'put_object', and 'POST' for 'post'/'post_object', and then replace
the existing conditional blocks in the code that reference CloudProvider.GCP and
CloudProvider.AZURE to call this helper.
- Around line 35-36: The branch handling CloudProvider.AZURE is inconsistent
because AzureBlobStorage is constructed with **credentials while S3Storage() and
GCSStorage() are created without; either pass credentials to all providers or
stop forwarding them to Azure. Inspect the AzureBlobStorage constructor
signature and, if other providers accept explicit credentials, change the
S3Storage() and GCSStorage() instantiations to S3Storage(**credentials) and
GCSStorage(**credentials); otherwise remove **credentials from AzureBlobStorage
and build its required account_url/connection parameters explicitly (or map the
credentials into the specific Azure args) so credential handling is consistent
across CloudProvider.AZURE, CloudProvider.S3 and CloudProvider.GCS.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: aa3d1ce4-ba50-419e-834c-c8c9e3d0a727
⛔ Files ignored due to path filters (1)
wavefront/server/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
wavefront/server/packages/flo_cloud/flo_cloud/azure/__init__.pywavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.pywavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.pywavefront/server/packages/flo_cloud/pyproject.toml
| except ResourceNotFoundError: | ||
| raise CloudStorageFileNotFoundError(bucket_name, file_path) | ||
| except Exception as e: | ||
| raise Exception(f'Error reading file from Azure Blob Storage: {str(e)}') |
There was a problem hiding this comment.
Preserve exception chain with raise ... from e.
Static analysis correctly flags this pattern. When re-raising exceptions, use raise ... from e to preserve the original traceback for debugging. This applies to all exception handlers in this file (lines 113, 115, 144-146, 172-174, 190, 192, 260-262, 303, 305).
🔧 Example fix for get_file
except ResourceNotFoundError:
- raise CloudStorageFileNotFoundError(bucket_name, file_path)
+ raise CloudStorageFileNotFoundError(bucket_name, file_path) from None
except Exception as e:
- raise Exception(f'Error reading file from Azure Blob Storage: {str(e)}')
+ raise Exception(f'Error reading file from Azure Blob Storage: {e!r}') from eApply the same pattern to save_large_file, save_small_file, delete_file, generate_presigned_url, and list_files.
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 113-113: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
[warning] 114-114: Do not catch blind exception: Exception
(BLE001)
[warning] 115-115: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
[warning] 115-115: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py` around
lines 112 - 115, Exception handlers in functions like get_file, save_large_file,
save_small_file, delete_file, generate_presigned_url, and list_files currently
re-raise new exceptions without chaining the original exception; update each
except block to use "raise <NewException>(...) from e" (e.g., raise
CloudStorageFileNotFoundError(bucket_name, file_path) from e or raise
Exception(f'...: {e}') from e) so the original traceback is preserved and
included in the new exception; locate the except clauses that catch
ResourceNotFoundError and general Exception in those functions and replace their
raise statements to include "from e".
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py (1)
83-91: Regex only matches public Azure cloud endpoints.The pattern
https://([^.]+)\.blob\.core\.windows\.netwon't match Azure Government (.blob.core.usgovcloudapi.net) or China (.blob.core.chinacloudapi.cn) endpoints. If sovereign cloud support isn't needed now, consider adding a TODO or documenting this limitation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py` around lines 83 - 91, _parse_account_name currently only handles public Azure endpoints; update it to support sovereign clouds or document the limitation. Either broaden the regex in _parse_account_name to accept variants like .blob.core.windows.net, .blob.core.usgovcloudapi.net, and .blob.core.chinacloudapi.cn (e.g., allow suffixes matching core\.(windows|usgovcloudapi|chinacloudapi)\.net|chinacloudapi\.cn) and ensure match.group(1) still returns the account name, or add a TODO/comment above _parse_account_name stating that sovereign cloud endpoints are not supported and validate/raise a clearer error if an unsupported domain is provided. Ensure the change references the _parse_account_name method so reviewers can locate it.wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py (1)
152-159: Return type updated butNonefallback remains — consider raising for unknown providers.The signature now correctly reflects
Optional[str], addressing the past type mismatch. However, returningNonefor unrecognized providers could cause issues for callers that don't handle it (e.g., the test fixture inconftest.pymocks this with a string return). Consider raising aValueErrorfor unsupported providers to fail fast, aligning with the error-raising pattern in_convert_to_valid_type.🛠️ Suggested alternative
def file_protocol(self) -> Optional[str]: if self.provider == CloudProvider.AWS: return 's3' elif self.provider == CloudProvider.GCP: return 'gs' elif self.provider == CloudProvider.AZURE: return 'azure' - return None + raise ValueError(f'Unsupported provider: {self.provider}')If raising is adopted, update the return type back to
str.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py` around lines 152 - 159, The file_protocol method currently returns None for unknown providers which can hide errors; change file_protocol to raise a ValueError for unsupported CloudProvider values (mirror the error-raising behavior of _convert_to_valid_type) and update its signature to return str (not Optional[str]); update any callers/tests (e.g., the fixture in conftest.py) that mock file_protocol to return a string or to expect the raised ValueError so tests reflect the new fail-fast behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py`:
- Around line 262-263: The constructed blob URL uses the raw blob name (key)
which can contain unsafe characters; update the blob URL assembly to URL-encode
the blob key path component before concatenation. Use urllib.parse.quote(key,
safe='/') (or equivalent URL path encoding) when building blob_url so
_account_url, bucket_name and encoded key are combined, then append
?{sas_token}; keep the sas_token unencoded and only encode the path component
used in blob_url.
---
Nitpick comments:
In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py`:
- Around line 83-91: _parse_account_name currently only handles public Azure
endpoints; update it to support sovereign clouds or document the limitation.
Either broaden the regex in _parse_account_name to accept variants like
.blob.core.windows.net, .blob.core.usgovcloudapi.net, and
.blob.core.chinacloudapi.cn (e.g., allow suffixes matching
core\.(windows|usgovcloudapi|chinacloudapi)\.net|chinacloudapi\.cn) and ensure
match.group(1) still returns the account name, or add a TODO/comment above
_parse_account_name stating that sovereign cloud endpoints are not supported and
validate/raise a clearer error if an unsupported domain is provided. Ensure the
change references the _parse_account_name method so reviewers can locate it.
In `@wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py`:
- Around line 152-159: The file_protocol method currently returns None for
unknown providers which can hide errors; change file_protocol to raise a
ValueError for unsupported CloudProvider values (mirror the error-raising
behavior of _convert_to_valid_type) and update its signature to return str (not
Optional[str]); update any callers/tests (e.g., the fixture in conftest.py) that
mock file_protocol to return a string or to expect the raised ValueError so
tests reflect the new fail-fast behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 84946d72-9d7d-4104-92e5-a869c421b02e
📒 Files selected for processing (2)
wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.pywavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py
* feat(flo-cloud): add Azure Blob Storage provider - Adds AzureBlobStorage as a third cloud storage backend alongside AWS S3 and GCP GCS. - Implements the full CloudStorageHandler interface (get, upload, delete, list, presigned SAS URLs, text writer). - Supports both Service Principal (AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID) and DefaultAzureCredential authentication. - Requires AZURE_STORAGE_ACCOUNT_URL to point to the storage account endpoint. - Also adds azure-storage-blob and azure-identity dependencies. * resolved review comments * url encoding blob key
Summary by CodeRabbit