Skip to content

610 azure blob storage-Implementation for azure blob storage.#640

Closed
shubhammahure wants to merge 11 commits intodevfrom
610-azure-blob-storage
Closed

610 azure blob storage-Implementation for azure blob storage.#640
shubhammahure wants to merge 11 commits intodevfrom
610-azure-blob-storage

Conversation

@shubhammahure
Copy link
Copy Markdown
Contributor

Description

Implementation for azure blob storage.
syncing, upload, download, copying, delete operations

@shubhammahure shubhammahure linked an issue Apr 1, 2025 that may be closed by this pull request
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2025

@CodiumAI-Agent /describe

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2025

@CodiumAI-Agent /review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2025

@CodiumAI-Agent /improve

@ryanweiler92
Copy link
Copy Markdown
Collaborator

@shubhammahure Can you please add a description of your changes and a section on how to test your changes to the PR description?

@ryanweiler92 ryanweiler92 removed the request for review from themaherkhalil May 7, 2025 14:55
@QodoAI-Agent
Copy link
Copy Markdown

QodoAI-Agent commented May 7, 2025

PR Code Suggestions ✨

Latest suggestions up to 1d5233e

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent null blob name returns

Instead of returning null when failing to read file properties, throw an unchecked
exception to prevent null blob names from being added to the upload list. This
ensures error propagation and avoids later NPEs or confusing log entries.

src/prerna/engine/impl/storage/AzureNativeBlobStorageEngine.java [494-496]

 } catch (IOException e) {
     classLogger.error("Failed to read file properties: " + file, e);
-    return null;
+    throw new RuntimeException("Failed to read file properties: " + file, e);
 }
Suggestion importance[1-10]: 8

__

Why: Throwing a runtime exception instead of returning null ensures that failures in uploadFileToBlob are immediately visible and avoids later NPEs or invalid entries in the upload list.

Medium
General
Handle empty-prefix blob listing

Move the paths.isEmpty() check outside the loop so that when no prefixes are
provided you list all blobs once, rather than skipping the entire loop. This ensures
container-wide listing works correctly.

src/prerna/engine/impl/storage/AzureNativeBlobStorageEngine.java [281-289]

 List<String> paths = parseStorageObjectPaths(blobDirectory);
 // Ensure local directory exists
 Files.createDirectories(localDirectory);
 
-List<String> downloadedFiles = new ArrayList<>(), failedFiles = new ArrayList<>();
-boolean found = false;
-for (String path : paths) {
-Iterable<BlobItem> getBlobItems = paths.isEmpty() ? containerClient.listBlobs()
-        : containerClient.listBlobs(new ListBlobsOptions().setPrefix(path), null);
-
-for (BlobItem blobItem : getBlobItems) {
+Iterable<BlobItem> blobItems;
+if (paths.isEmpty()) {
+    blobItems = containerClient.listBlobs();
+} else {
+    // aggregate items from each prefix
+    List<BlobItem> allItems = new ArrayList<>();
+    for (String prefix : paths) {
+        containerClient.listBlobs(new ListBlobsOptions().setPrefix(prefix), null)
+                       .forEach(allItems::add);
+    }
+    blobItems = allItems;
+}
+for (BlobItem blobItem : blobItems) {
     ...
 }
-}
Suggestion importance[1-10]: 8

__

Why: The current loop skips container-wide listing when paths is empty, leading to no downloads; moving the empty check outside fixes a critical logic bug.

Medium
Close Files.walk stream

Use a try-with-resources block for Files.walk so the stream is closed properly after
iteration. This avoids potential resource leaks when walking large directory trees.

src/prerna/engine/impl/storage/AzureNativeBlobStorageEngine.java [117-124]

-Files.walk(localFilePath).filter(Files::isRegularFile).forEach(file -> {
-    try {
-        uploadedFiles.add(uploadFileToBlob(file, localFilePath, containerClient, blobDirectory, metadata));
-    } catch (Exception e) {
-        failedFiles.add(file.toString());
-        classLogger.error("Failed to upload file: " + file, e);
-    }
-});
+try (Stream<Path> stream = Files.walk(localFilePath)) {
+    stream.filter(Files::isRegularFile).forEach(file -> {
+        try {
+            uploadedFiles.add(uploadFileToBlob(file, localFilePath, containerClient, blobDirectory, metadata));
+        } catch (Exception e) {
+            failedFiles.add(file.toString());
+            classLogger.error("Failed to upload file: " + file, e);
+        }
+    });
+}
Suggestion importance[1-10]: 6

__

Why: Wrapping Files.walk in a try-with-resources block ensures the stream is closed properly and prevents resource leaks when traversing directories.

Low

Previous suggestions

Suggestions up to commit 4f682ae
CategorySuggestion                                                                                                                                    Impact
Possible issue
Support listing with empty prefixes

When paths is empty, the loop never runs and no blobs are processed. Extract the
listing logic to handle the empty-prefix case before iterating, ensuring all blobs
are listed when no specific prefix is provided.

src/prerna/engine/impl/storage/AzureNativeBlobStorageEngine.java [281-290]

 List<String> paths = parseStorageObjectPaths(blobDirectory);
-for (String path : paths) {
-    Iterable<BlobItem> getBlobItems = paths.isEmpty() ? containerClient.listBlobs()
-            : containerClient.listBlobs(new ListBlobsOptions().setPrefix(path), null);
+Iterable<BlobItem> blobItems;
+if (paths.isEmpty()) {
+    blobItems = containerClient.listBlobs();
+} else {
+    List<BlobItem> tempList = new ArrayList<>();
+    for (String prefix : paths) {
+        for (BlobItem item : containerClient.listBlobs(new ListBlobsOptions().setPrefix(prefix), null)) {
+            tempList.add(item);
+        }
+    }
+    blobItems = tempList;
+}
+for (BlobItem blobItem : blobItems) {
Suggestion importance[1-10]: 8

__

Why: The current loop skips all blobs when paths is empty, so fixes to handle the empty-prefix case are critical to ensure files are processed correctly.

Medium
Move empty-blob cleanup outside loop

Calling deleteEmptyBlobs inside the per-blob iteration can remove items during
listing and degrade performance. Move the cleanup call outside the loop so it runs
only once before or after processing all blob items.

src/prerna/engine/impl/storage/AzureNativeBlobStorageEngine.java [155-160]

-for (BlobItem blobItem : containerClient
-        .listBlobs(new ListBlobsOptions().setPrefix(blobDirectory.isEmpty() ? null : blobDirectory), null)) {
+// Delete empty folder placeholders once before processing
+deleteEmptyBlobs(containerClient);
+for (BlobItem blobItem : containerClient.listBlobs(new ListBlobsOptions().setPrefix(blobDirectory.isEmpty() ? null : blobDirectory), null)) {
     BlobClient blobClient = containerClient.getBlobClient(blobItem.getName());
     BlobProperties properties = blobClient.getProperties();
-    // Delete empty folder from azure storage (zero-byte blob)
-    deleteEmptyBlobs(containerClient);
Suggestion importance[1-10]: 5

__

Why: Removing deleteEmptyBlobs from inside the per-blob loop reduces unnecessary repeated cleanup calls and improves performance without changing behavior.

Low
General
Strip all leading slashes

The call to replaceFirst("^/", "") only strips a single leading slash. Use
replaceAll("^/+", "") to remove any number of leading slashes and ensure correct
normalization of cloud paths.

src/prerna/engine/impl/storage/AbstractStorageEngine.java [86-98]

 protected List<String> parseStorageObjectPaths(String commaSeparatedPaths) {
     List<String> result = new ArrayList<>();
     String[] parts = commaSeparatedPaths.split(",");
     for (String part : parts) {
         String trimmed = part.trim();
         if (!trimmed.isEmpty()) {
-            String normalized = trimmed.replace("\\", "/").replaceFirst("^/", "");
+            String normalized = trimmed.replace("\\", "/").replaceAll("^/+", "");
             result.add(normalized);
         }
     }
     return result;
 }
Suggestion importance[1-10]: 4

__

Why: Switching from replaceFirst("^/", "") to replaceAll("^/+", "") correctly normalizes paths with multiple leading slashes, improving robustness.

Low
Guard against null input

If commaSeparatedPaths is null or empty, calling split will throw a
NullPointerException or return an unexpected array. Add a guard clause to return an
empty list safely when input is null or blank.

src/prerna/engine/impl/storage/AbstractStorageEngine.java [71-73]

 protected List<Path> parseLocalPaths(String commaSeparatedPaths) throws Exception {
     List<Path> result = new ArrayList<>();
+    if (commaSeparatedPaths == null || commaSeparatedPaths.trim().isEmpty()) {
+        return result;
+    }
     String[] parts = commaSeparatedPaths.split(",");
Suggestion importance[1-10]: 3

__

Why: Adding a null-or-blank check before split prevents potential NPEs for null or empty input, improving method safety with minimal impact.

Low

@shubhammahure
Copy link
Copy Markdown
Contributor Author

by mistakenly i forgot to remove this PR as i have merged GCS, AZURE and AWS.I have created combined PR #674

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Azure Blob Storage

4 participants