Introduce StorageConnector for Azure by LakshSingla · Pull Request #14660 · apache/druid

LakshSingla · 2023-07-25T16:53:53Z

Description

This PR adds the storage connector to interact with Azure's blob storage using the current Azure API used in Druid. This will allow Durable storage and MSQ's interactive APIs to work with Azure

This also refactors the currently available S3 connector so that the chunking downloads that is currently done by the S3 connector can be extended to other connectors. (note: This refactoring is ported from the PR #14611 since that is currently parked for work).

Testing plan

Adding unit tests to the Azure connector
Functionally testing that the Azure connector works as expected.
Sanity testing that the S3 connector works as expected since it has been refactored
Performance comparison between the Azure connector (new feature) and the S3 connector (current benchmark)

Release note

Azure connector has been introduced and MSQ's fault tolerance and durable storage can now be used with Microsoft Azure's blob storage. Also the results of newly introduced queries from deep storage can now store and fetch the results from the Azure's blob storage.

Key changed/added classes in this PR

This PR has:

…zure lib

cryptoe · 2023-07-28T06:05:28Z

+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicLong;
+
+public abstract class ChunkingStorageConnector<T> implements StorageConnector


Can you please java doc this since this is the crux of this PR .

adarshsanjeev · 2023-08-02T04:36:33Z

+    public ChunkingStorageConnectorParameters<T> build()
+    {
+      Preconditions.checkArgument(start >= 0, "'start' not provided or an incorrect value [%s] passed", start);
+      Preconditions.checkArgument(end >= 0, "'end' not provided or an incorrect value [%s] passed", end);


Would end < start return a good error message?

Updated a check with this as well in the PR!

adarshsanjeev · 2023-08-02T04:51:15Z

+{
+  private static final long DOWNLOAD_MAX_CHUNK_SIZE_BYTES = 100_000_000;
+
+  public ChunkingStorageConnector()


Does this need to be public?

Reverted the change so that the individual connectors can control the chunk sizes. Used primarily for testing for now, though this can be extended to the real implementations as well.

adarshsanjeev

Looks good to me overall

adarshsanjeev · 2023-08-07T02:47:22Z

+                      params.getMaxRetry()
+                  ),
+                  outFile,
+                  new byte[8 * 1024],


I know this code was only moved, but could you add a comment on why these numbers are chosen?

cryptoe

Changes LGTM. The user facing docs are remaining.

LakshSingla · 2023-08-09T12:24:42Z

Thanks, @adarshsanjeev @cryptoe for the reviews and @dhananjay1308 for testing the changes out on a cluster.
Testing for Azure has been ongoing for a day. Queries for durable storage on Azure are taking comparable times to durable storage on S3, and there don't seem to be any performance concerns for the new storage connector. Going ahead with the merge.

LakshSingla added 14 commits July 5, 2023 10:41

initial commit

26f5bfb

classes n code

2b9787a

Merge branch 'master' into gcs-storage-connector

4631946

add stubs for other classes

d41e967

goc changes stash

71c50fa

version before batch delete

d07b740

storage connector final

7c482c5

Merge branch 'master' into gcs-storage-connector

39eff75

cleanup S3 storage connector

5730c35

change byte format

b119af8

add azure files

8448be4

revert gcs changes

7953f73

remove files

f657137

remove RetryableAzureOutputStream since that is already done in the a…

cffbb7e

…zure lib

github-advanced-security AI found potential problems Jul 26, 2023

View reviewed changes

Comment thread extensions-core/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureStorage.java Fixed

docs

efb3b85

cryptoe reviewed Jul 28, 2023

View reviewed changes

add tests, comments, validation

2782636

github-advanced-security AI found potential problems Aug 1, 2023

View reviewed changes

Comment thread ...re-extensions/src/test/java/org/apache/druid/storage/azure/output/AzureOutputConfigTest.java Fixed

Comment thread ...ure-extensions/src/test/java/org/apache/druid/storage/azure/output/AzureOutputSerdeTest.java Fixed

adarshsanjeev reviewed Aug 2, 2023

View reviewed changes

LakshSingla added 5 commits August 2, 2023 11:40

add coverage

86ad32a

more test coverage, review

d2b68f3

tests fix

ef18323

fix import

f87266d

add more tests

9cc6bf3

cryptoe added the Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 label Aug 4, 2023

LakshSingla added 3 commits August 4, 2023 16:11

fixup tests

7e138ef

more tests

eafe836

more coverage

47aeae6

adarshsanjeev approved these changes Aug 7, 2023

View reviewed changes

cryptoe approved these changes Aug 7, 2023

View reviewed changes

LakshSingla added 3 commits August 8, 2023 10:43

refactor, add comments

08cecb0

Merge branch 'master' into azure-storage-connector

56062f4

docs

b458bd6

github-actions Bot added the Area - Documentation label Aug 8, 2023

LakshSingla added 3 commits August 8, 2023 12:13

spellcheck

850cb94

create dir before checking for permissions

8639351

check fix

575e2ec

github-advanced-security AI found potential problems Aug 8, 2023

View reviewed changes

Comment thread .../azure-extensions/src/main/java/org/apache/druid/storage/azure/output/AzureOutputConfig.java Fixed

build fix

e1da797

LakshSingla added the Release Notes label Aug 8, 2023

LakshSingla merged commit 8f102f9 into apache:master Aug 9, 2023

LakshSingla deleted the azure-storage-connector branch August 9, 2023 12:25

LakshSingla added this to the 28.0 milestone Oct 12, 2023

LakshSingla mentioned this pull request Nov 4, 2023

[DRAFT] 28.0.0 release notes #15326

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce StorageConnector for Azure#14660

Introduce StorageConnector for Azure#14660
LakshSingla merged 31 commits intoapache:masterfrom
LakshSingla:azure-storage-connector

LakshSingla commented Jul 25, 2023 •

edited

Loading

Uh oh!

Uh oh!

cryptoe Jul 28, 2023

Uh oh!

Uh oh!

Uh oh!

adarshsanjeev Aug 2, 2023

Uh oh!

LakshSingla Aug 9, 2023

Uh oh!

adarshsanjeev Aug 2, 2023

Uh oh!

LakshSingla Aug 7, 2023

Uh oh!

adarshsanjeev left a comment

Uh oh!

adarshsanjeev Aug 7, 2023

Uh oh!

LakshSingla Aug 7, 2023

Uh oh!

cryptoe left a comment

Uh oh!

Uh oh!

LakshSingla commented Aug 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

LakshSingla commented Jul 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing plan

Release note

Key changed/added classes in this PR

Uh oh!

Uh oh!

cryptoe Jul 28, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

adarshsanjeev Aug 2, 2023

Choose a reason for hiding this comment

Uh oh!

LakshSingla Aug 9, 2023

Choose a reason for hiding this comment

Uh oh!

adarshsanjeev Aug 2, 2023

Choose a reason for hiding this comment

Uh oh!

LakshSingla Aug 7, 2023

Choose a reason for hiding this comment

Uh oh!

adarshsanjeev left a comment

Choose a reason for hiding this comment

Uh oh!

adarshsanjeev Aug 7, 2023

Choose a reason for hiding this comment

Uh oh!

LakshSingla Aug 7, 2023

Choose a reason for hiding this comment

Uh oh!

cryptoe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LakshSingla commented Aug 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LakshSingla commented Jul 25, 2023 •

edited

Loading