Skip to content

feat(flo-cloud): add Azure Blob Storage provider#249

Merged
vizsatiz merged 3 commits into
developfrom
CU-86d2d4mer-Azure-blob-storage-support
Mar 24, 2026
Merged

feat(flo-cloud): add Azure Blob Storage provider#249
vizsatiz merged 3 commits into
developfrom
CU-86d2d4mer-Azure-blob-storage-support

Conversation

@rootflo-hardik
Copy link
Copy Markdown
Contributor

@rootflo-hardik rootflo-hardik commented Mar 24, 2026

  • Adds AzureBlobStorage as a third cloud storage backend alongside AWS S3 and GCP GCS.
  • Implements the full CloudStorageHandler interface (get, upload, delete, list, presigned SAS URLs, text writer).
  • Supports both Service Principal (AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID) and DefaultAzureCredential authentication.
  • Requires AZURE_STORAGE_ACCOUNT_URL to point to the storage account endpoint.
  • Also adds azure-storage-blob and azure-identity dependencies.

Summary by CodeRabbit

  • New Features
    • Added Azure Blob Storage support for storing, retrieving, listing, and deleting files.
    • Supports both small and large uploads, text writer context for UTF-8 uploads, and MIME type handling.
    • Generates secure presigned (SAS) URLs with method-specific permissions and expiry.
    • File listing supports pagination (page size and page number) and indicates if more pages exist.

- Adds AzureBlobStorage as a third cloud storage backend alongside AWS S3
and GCP GCS.
- Implements the full CloudStorageHandler interface (get, upload,
delete, list, presigned SAS URLs, text writer).
- Supports both Service Principal (AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID) and DefaultAzureCredential authentication.
- Requires AZURE_STORAGE_ACCOUNT_URL to point to the storage account endpoint.
- Also adds azure-storage-blob and azure-identity dependencies.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 24, 2026

Warning

Rate limit exceeded

@rootflo-hardik has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 4 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fca135c7-4bf6-4aa5-a3fd-70bd2848241c

📥 Commits

Reviewing files that changed from the base of the PR and between 0f8c809 and ee8b209.

📒 Files selected for processing (1)
  • wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py
📝 Walkthrough

Walkthrough

Adds Azure Blob Storage support: new AzureBlobStorage class and package initializer, integrates it into the cloud storage factory/manager, and adds Azure SDK dependencies.

Changes

Cohort / File(s) Summary
Azure package & implementation
wavefront/server/packages/flo_cloud/flo_cloud/azure/__init__.py, wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py
New Azure package initializer and AzureBlobStorage class implementing blob operations (get, upload small/large, delete, list, presigned URL via user-delegation SAS, text writer), path parsing for azure://, and Azure credential selection.
Cloud storage integration
wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py
Factory updated to instantiate AzureBlobStorage for CloudProvider.AZURE; _convert_to_valid_type extended with Azure operation mappings; file_protocol return type widened and now returns 'azure' for Azure provider.
Dependencies
wavefront/server/packages/flo_cloud/pyproject.toml
Added azure-identity>=1.17.0 and azure-storage-blob>=12.20.0 to project dependencies.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant App as AzureBlobStorage
  participant AD as Azure AD (credential)
  participant Blob as Blob Service

  Client->>App: request presigned URL (bucket,key,method,expiresIn)
  App->>AD: request user delegation key (start,expiry)
  AD-->>App: user delegation key
  App->>Blob: build SAS using delegation key and permissions
  Blob-->>App: SAS token / confirmation
  App-->>Client: full SAS URL
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

🐰 I hopped into code with a curious twitch,
Found Azure blobs in a shimmering stitch—
I upload and fetch, then stamp a key,
Now files dance freely, thanks to me! ☁️✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(flo-cloud): add Azure Blob Storage provider' clearly and concisely summarizes the main change: adding Azure Blob Storage as a new cloud storage provider to flo-cloud.
Docstring Coverage ✅ Passed Docstring coverage is 84.21% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch CU-86d2d4mer-Azure-blob-storage-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py (2)

194-210: Add return type annotation.

The method returns a tuple but lacks a type annotation. This should be Tuple[str, str] for consistency with the documented return value.

📝 Suggested fix
-    def get_bucket_key(self, value: str):
+    def get_bucket_key(self, value: str) -> Tuple[str, str]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py` around
lines 194 - 210, get_bucket_key currently lacks a return type annotation; update
the signature of get_bucket_key to declare it returns Tuple[str, str] and, if
Tuple isn't already imported, add it from typing so the function signature reads
with the correct type hint; keep the function body unchanged and ensure any
linters/types checkers pick up the new annotation.

212-213: Parameter type shadows Python builtin.

Static analysis flags type as shadowing the builtin. While this matches the interface signature in CloudStorageHandler, consider renaming to operation or method in a future refactor across all implementations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py` around
lines 212 - 213, The parameter name `type` in generate_presigned_url shadows the
Python builtin; update the signature in generate_presigned_url to use a
non-shadowing name like `operation` (or `method`) and update all internal
references within this function (and any unit tests or callers in this module)
to use the new name; also leave a TODO or comment noting that the
CloudStorageHandler interface should be updated in a follow-up so
implementations remain consistent across providers.
wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py (2)

86-92: Azure type conversion duplicates GCP logic.

Lines 86-92 are identical to lines 79-85 (GCP). Consider extracting a shared helper for providers that use GET/PUT/POST conventions, but this duplication is acceptable for now given the clarity it provides.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py` around lines
86 - 92, The Azure branch duplicates the same GET/PUT/POST mapping logic used
for GCP; extract the shared mapping into a helper (e.g.
_map_simple_http_method(type) or map_simple_provider_method) and call it from
both the CloudProvider.GCP and CloudProvider.AZURE branches to remove
duplication; the helper should return 'GET' for types 'get'/'get_object', 'PUT'
for 'put'/'put_object', and 'POST' for 'post'/'post_object', and then replace
the existing conditional blocks in the code that reference CloudProvider.GCP and
CloudProvider.AZURE to call this helper.

35-36: Inconsistent credential handling across providers.

AzureBlobStorage receives **credentials while S3Storage() and GCSStorage() are instantiated without any. If the other providers also support credential injection (e.g., for explicit key-based auth), consider passing credentials consistently. Otherwise, this asymmetry is acceptable since Azure requires explicit account URL configuration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py` around lines
35 - 36, The branch handling CloudProvider.AZURE is inconsistent because
AzureBlobStorage is constructed with **credentials while S3Storage() and
GCSStorage() are created without; either pass credentials to all providers or
stop forwarding them to Azure. Inspect the AzureBlobStorage constructor
signature and, if other providers accept explicit credentials, change the
S3Storage() and GCSStorage() instantiations to S3Storage(**credentials) and
GCSStorage(**credentials); otherwise remove **credentials from AzureBlobStorage
and build its required account_url/connection parameters explicitly (or map the
credentials into the specific Azure args) so credential handling is consistent
across CloudProvider.AZURE, CloudProvider.S3 and CloudProvider.GCS.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py`:
- Around line 288-300: The current list_files implementation uses islice over
blobs_iterator which iterates through all preceding items; change it to use
Azure server-side pagination: call
container_client.list_blobs(name_starts_with=prefix,
results_per_page=page_size).by_page() (use results_per_page and by_page on the
pages iterator), validate page_number and page_size >= 1, advance the pages
iterator page_number-1 times catching StopIteration to return empty results,
then materialize the current page into blob_names via [blob.name for blob in
current_page], and determine has_next by attempting next(pages) (catch
StopIteration to set False); update the logic in list_files to use these symbols
(list_files, container_client.list_blobs, results_per_page, by_page, pages)
instead of islice.
- Around line 62-69: The code currently silently falls back to
DefaultAzureCredential when only a subset of client credentials are provided;
update the credential selection logic so that if any of tenant_id, client_id, or
client_secret are provided but not all three, you raise a clear ValueError (or
appropriate exception) instead of using DefaultAzureCredential. In the
blob_storage credential creation block (where ClientSecretCredential and
DefaultAzureCredential are used and the variable credential is assigned),
validate that either all three values are truthy (then construct
ClientSecretCredential with tenant_id, client_id, client_secret) or none are
provided (then use DefaultAzureCredential); if a partial set is present, raise
an error mentioning the missing fields.
- Around line 112-115: Exception handlers in functions like get_file,
save_large_file, save_small_file, delete_file, generate_presigned_url, and
list_files currently re-raise new exceptions without chaining the original
exception; update each except block to use "raise <NewException>(...) from e"
(e.g., raise CloudStorageFileNotFoundError(bucket_name, file_path) from e or
raise Exception(f'...: {e}') from e) so the original traceback is preserved and
included in the new exception; locate the except clauses that catch
ResourceNotFoundError and general Exception in those functions and replace their
raise statements to include "from e".

In `@wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py`:
- Around line 152-159: The file_protocol method currently claims to return str
but returns None for unknown providers; change it to always return a str by
replacing the final "return None" path with raising a ValueError (include the
unknown self.provider in the message) so callers of file_protocol() never get
None; update the file_protocol signature if you prefer the alternative (->
Optional[str]) and handle callers accordingly, but the recommended fix is to
modify file_protocol to raise ValueError for unsupported CloudProvider values.

---

Nitpick comments:
In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py`:
- Around line 194-210: get_bucket_key currently lacks a return type annotation;
update the signature of get_bucket_key to declare it returns Tuple[str, str]
and, if Tuple isn't already imported, add it from typing so the function
signature reads with the correct type hint; keep the function body unchanged and
ensure any linters/types checkers pick up the new annotation.
- Around line 212-213: The parameter name `type` in generate_presigned_url
shadows the Python builtin; update the signature in generate_presigned_url to
use a non-shadowing name like `operation` (or `method`) and update all internal
references within this function (and any unit tests or callers in this module)
to use the new name; also leave a TODO or comment noting that the
CloudStorageHandler interface should be updated in a follow-up so
implementations remain consistent across providers.

In `@wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py`:
- Around line 86-92: The Azure branch duplicates the same GET/PUT/POST mapping
logic used for GCP; extract the shared mapping into a helper (e.g.
_map_simple_http_method(type) or map_simple_provider_method) and call it from
both the CloudProvider.GCP and CloudProvider.AZURE branches to remove
duplication; the helper should return 'GET' for types 'get'/'get_object', 'PUT'
for 'put'/'put_object', and 'POST' for 'post'/'post_object', and then replace
the existing conditional blocks in the code that reference CloudProvider.GCP and
CloudProvider.AZURE to call this helper.
- Around line 35-36: The branch handling CloudProvider.AZURE is inconsistent
because AzureBlobStorage is constructed with **credentials while S3Storage() and
GCSStorage() are created without; either pass credentials to all providers or
stop forwarding them to Azure. Inspect the AzureBlobStorage constructor
signature and, if other providers accept explicit credentials, change the
S3Storage() and GCSStorage() instantiations to S3Storage(**credentials) and
GCSStorage(**credentials); otherwise remove **credentials from AzureBlobStorage
and build its required account_url/connection parameters explicitly (or map the
credentials into the specific Azure args) so credential handling is consistent
across CloudProvider.AZURE, CloudProvider.S3 and CloudProvider.GCS.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aa3d1ce4-ba50-419e-834c-c8c9e3d0a727

📥 Commits

Reviewing files that changed from the base of the PR and between 0d2e44c and c951d3c.

⛔ Files ignored due to path filters (1)
  • wavefront/server/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • wavefront/server/packages/flo_cloud/flo_cloud/azure/__init__.py
  • wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py
  • wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py
  • wavefront/server/packages/flo_cloud/pyproject.toml

Comment thread wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py Outdated
Comment on lines +112 to +115
except ResourceNotFoundError:
raise CloudStorageFileNotFoundError(bucket_name, file_path)
except Exception as e:
raise Exception(f'Error reading file from Azure Blob Storage: {str(e)}')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Preserve exception chain with raise ... from e.

Static analysis correctly flags this pattern. When re-raising exceptions, use raise ... from e to preserve the original traceback for debugging. This applies to all exception handlers in this file (lines 113, 115, 144-146, 172-174, 190, 192, 260-262, 303, 305).

🔧 Example fix for get_file
         except ResourceNotFoundError:
-            raise CloudStorageFileNotFoundError(bucket_name, file_path)
+            raise CloudStorageFileNotFoundError(bucket_name, file_path) from None
         except Exception as e:
-            raise Exception(f'Error reading file from Azure Blob Storage: {str(e)}')
+            raise Exception(f'Error reading file from Azure Blob Storage: {e!r}') from e

Apply the same pattern to save_large_file, save_small_file, delete_file, generate_presigned_url, and list_files.

🧰 Tools
🪛 Ruff (0.15.6)

[warning] 113-113: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


[warning] 114-114: Do not catch blind exception: Exception

(BLE001)


[warning] 115-115: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


[warning] 115-115: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py` around
lines 112 - 115, Exception handlers in functions like get_file, save_large_file,
save_small_file, delete_file, generate_presigned_url, and list_files currently
re-raise new exceptions without chaining the original exception; update each
except block to use "raise <NewException>(...) from e" (e.g., raise
CloudStorageFileNotFoundError(bucket_name, file_path) from e or raise
Exception(f'...: {e}') from e) so the original traceback is preserved and
included in the new exception; locate the except clauses that catch
ResourceNotFoundError and general Exception in those functions and replace their
raise statements to include "from e".

Comment thread wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py Outdated
Comment thread wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py (1)

83-91: Regex only matches public Azure cloud endpoints.

The pattern https://([^.]+)\.blob\.core\.windows\.net won't match Azure Government (.blob.core.usgovcloudapi.net) or China (.blob.core.chinacloudapi.cn) endpoints. If sovereign cloud support isn't needed now, consider adding a TODO or documenting this limitation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py` around
lines 83 - 91, _parse_account_name currently only handles public Azure
endpoints; update it to support sovereign clouds or document the limitation.
Either broaden the regex in _parse_account_name to accept variants like
.blob.core.windows.net, .blob.core.usgovcloudapi.net, and
.blob.core.chinacloudapi.cn (e.g., allow suffixes matching
core\.(windows|usgovcloudapi|chinacloudapi)\.net|chinacloudapi\.cn) and ensure
match.group(1) still returns the account name, or add a TODO/comment above
_parse_account_name stating that sovereign cloud endpoints are not supported and
validate/raise a clearer error if an unsupported domain is provided. Ensure the
change references the _parse_account_name method so reviewers can locate it.
wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py (1)

152-159: Return type updated but None fallback remains — consider raising for unknown providers.

The signature now correctly reflects Optional[str], addressing the past type mismatch. However, returning None for unrecognized providers could cause issues for callers that don't handle it (e.g., the test fixture in conftest.py mocks this with a string return). Consider raising a ValueError for unsupported providers to fail fast, aligning with the error-raising pattern in _convert_to_valid_type.

🛠️ Suggested alternative
     def file_protocol(self) -> Optional[str]:
         if self.provider == CloudProvider.AWS:
             return 's3'
         elif self.provider == CloudProvider.GCP:
             return 'gs'
         elif self.provider == CloudProvider.AZURE:
             return 'azure'
-        return None
+        raise ValueError(f'Unsupported provider: {self.provider}')

If raising is adopted, update the return type back to str.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py` around lines
152 - 159, The file_protocol method currently returns None for unknown providers
which can hide errors; change file_protocol to raise a ValueError for
unsupported CloudProvider values (mirror the error-raising behavior of
_convert_to_valid_type) and update its signature to return str (not
Optional[str]); update any callers/tests (e.g., the fixture in conftest.py) that
mock file_protocol to return a string or to expect the raised ValueError so
tests reflect the new fail-fast behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py`:
- Around line 262-263: The constructed blob URL uses the raw blob name (key)
which can contain unsafe characters; update the blob URL assembly to URL-encode
the blob key path component before concatenation. Use urllib.parse.quote(key,
safe='/') (or equivalent URL path encoding) when building blob_url so
_account_url, bucket_name and encoded key are combined, then append
?{sas_token}; keep the sas_token unencoded and only encode the path component
used in blob_url.

---

Nitpick comments:
In `@wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py`:
- Around line 83-91: _parse_account_name currently only handles public Azure
endpoints; update it to support sovereign clouds or document the limitation.
Either broaden the regex in _parse_account_name to accept variants like
.blob.core.windows.net, .blob.core.usgovcloudapi.net, and
.blob.core.chinacloudapi.cn (e.g., allow suffixes matching
core\.(windows|usgovcloudapi|chinacloudapi)\.net|chinacloudapi\.cn) and ensure
match.group(1) still returns the account name, or add a TODO/comment above
_parse_account_name stating that sovereign cloud endpoints are not supported and
validate/raise a clearer error if an unsupported domain is provided. Ensure the
change references the _parse_account_name method so reviewers can locate it.

In `@wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py`:
- Around line 152-159: The file_protocol method currently returns None for
unknown providers which can hide errors; change file_protocol to raise a
ValueError for unsupported CloudProvider values (mirror the error-raising
behavior of _convert_to_valid_type) and update its signature to return str (not
Optional[str]); update any callers/tests (e.g., the fixture in conftest.py) that
mock file_protocol to return a string or to expect the raised ValueError so
tests reflect the new fail-fast behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 84946d72-9d7d-4104-92e5-a869c421b02e

📥 Commits

Reviewing files that changed from the base of the PR and between c951d3c and 0f8c809.

📒 Files selected for processing (2)
  • wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py
  • wavefront/server/packages/flo_cloud/flo_cloud/cloud_storage.py

Comment thread wavefront/server/packages/flo_cloud/flo_cloud/azure/blob_storage.py Outdated
@vizsatiz vizsatiz merged commit 055304a into develop Mar 24, 2026
9 checks passed
@vizsatiz vizsatiz deleted the CU-86d2d4mer-Azure-blob-storage-support branch March 24, 2026 11:26
thomastomy5 pushed a commit that referenced this pull request Apr 27, 2026
* feat(flo-cloud): add Azure Blob Storage provider

- Adds AzureBlobStorage as a third cloud storage backend alongside AWS S3
and GCP GCS.
- Implements the full CloudStorageHandler interface (get, upload,
delete, list, presigned SAS URLs, text writer).
- Supports both Service Principal (AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID) and DefaultAzureCredential authentication.
- Requires AZURE_STORAGE_ACCOUNT_URL to point to the storage account endpoint.
- Also adds azure-storage-blob and azure-identity dependencies.

* resolved review comments

* url encoding blob key
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants