From f66c1549384c2043121e610b5091b8ab05e48f8b Mon Sep 17 00:00:00 2001 From: Steve Hetland Date: Thu, 22 Apr 2021 11:37:36 -0700 Subject: [PATCH 1/7] Mention Azure Data Lake --- docs/ingestion/native-batch.md | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/docs/ingestion/native-batch.md b/docs/ingestion/native-batch.md index 48539bb23e7f..7bcec98e35c5 100644 --- a/docs/ingestion/native-batch.md +++ b/docs/ingestion/native-batch.md @@ -1004,10 +1004,8 @@ Google Cloud Storage object: > You need to include the [`druid-azure-extensions`](../development/extensions-core/azure.md) as an extension to use the Azure input source. -The Azure input source is to support reading objects directly from Azure Blob store. Objects can be -specified as list of Azure Blob store URI strings. The Azure input source is splittable and can be used -by the [Parallel task](#parallel-task), where each worker task of `index_parallel` will read -a single object. +The Azure input source supports reading objects directly from Azure Blob store or Azure Data Lake sources. Objects can be +specified as a list of file URI strings or prefixes. The Azure input source is splittable, and supports [Parallel task](#parallel-task) processing whereby each `index_parallel` worker task reads a single object. Sample specs: @@ -1066,17 +1064,17 @@ Sample specs: |property|description|default|required?| |--------|-----------|-------|---------| |type|This should be `azure`.|None|yes| -|uris|JSON array of URIs where Azure Blob objects to be ingested are located. Should be in form "azure://\/\"|None|`uris` or `prefixes` or `objects` must be set| -|prefixes|JSON array of URI prefixes for the locations of Azure Blob objects to be ingested. Should be in the form "azure://\/\". Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or `objects` must be set| -|objects|JSON array of Azure Blob objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set| +|uris|JSON array of URIs where the Azure objects to be ingested are located, in the form "azure://\/\"|None|`uris` or `prefixes` or `objects` must be set| +|prefixes|JSON array of URI prefixes for the locations of Azure objects to be ingested, in the form "azure://\/\". Empty objects starting with one of the given prefixes are skipped.|None|`uris` or `prefixes` or `objects` must be set| +|objects|JSON array of Azure objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set| -Note that the Azure input source will skip all empty objects only when `prefixes` is specified. +Note that the Azure input source skips all empty objects only when `prefixes` is specified. -Azure Blob object: +The `objects` property is: |property|description|default|required?| |--------|-----------|-------|---------| -|bucket|Name of the Azure Blob Storage container|None|yes| +|bucket|Name of the Azure Blob Storage or Azure Data Lake container|None|yes| |path|The path where data is located.|None|yes| ### HDFS Input Source From 2258797c1574fd1250682e1b69c8ccf30e4d3b15 Mon Sep 17 00:00:00 2001 From: Steve Hetland Date: Thu, 22 Apr 2021 11:48:10 -0700 Subject: [PATCH 2/7] Make consistent with other entries --- docs/ingestion/native-batch.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/ingestion/native-batch.md b/docs/ingestion/native-batch.md index 7bcec98e35c5..26e8c604251d 100644 --- a/docs/ingestion/native-batch.md +++ b/docs/ingestion/native-batch.md @@ -1004,8 +1004,8 @@ Google Cloud Storage object: > You need to include the [`druid-azure-extensions`](../development/extensions-core/azure.md) as an extension to use the Azure input source. -The Azure input source supports reading objects directly from Azure Blob store or Azure Data Lake sources. Objects can be -specified as a list of file URI strings or prefixes. The Azure input source is splittable, and supports [Parallel task](#parallel-task) processing whereby each `index_parallel` worker task reads a single object. +The Azure input source is used to read objects directly from Azure Blob store or Azure Data Lake sources. Objects can be +specified as a list of file URI strings or prefixes. The Azure input source is splittable and can be used by the [Parallel task](#parallel-task), where each worker task reads a single object. Sample specs: From 918a2b4ef5ce3805d187c3105f739bb9b52dfa06 Mon Sep 17 00:00:00 2001 From: sthetland Date: Fri, 23 Apr 2021 09:54:54 -0700 Subject: [PATCH 3/7] Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> --- docs/ingestion/native-batch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ingestion/native-batch.md b/docs/ingestion/native-batch.md index 26e8c604251d..08b30e7cd3aa 100644 --- a/docs/ingestion/native-batch.md +++ b/docs/ingestion/native-batch.md @@ -1004,7 +1004,7 @@ Google Cloud Storage object: > You need to include the [`druid-azure-extensions`](../development/extensions-core/azure.md) as an extension to use the Azure input source. -The Azure input source is used to read objects directly from Azure Blob store or Azure Data Lake sources. Objects can be +The Azure input source reads objects directly from Azure Blob store or Azure Data Lake sources. You can specified as a list of file URI strings or prefixes. The Azure input source is splittable and can be used by the [Parallel task](#parallel-task), where each worker task reads a single object. Sample specs: From 680fd19455d3adcfd7db79a0a77f18a9b44597d4 Mon Sep 17 00:00:00 2001 From: sthetland Date: Fri, 23 Apr 2021 09:58:33 -0700 Subject: [PATCH 4/7] Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> --- docs/ingestion/native-batch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ingestion/native-batch.md b/docs/ingestion/native-batch.md index 08b30e7cd3aa..87ea95baeeb7 100644 --- a/docs/ingestion/native-batch.md +++ b/docs/ingestion/native-batch.md @@ -1005,7 +1005,7 @@ Google Cloud Storage object: > You need to include the [`druid-azure-extensions`](../development/extensions-core/azure.md) as an extension to use the Azure input source. The Azure input source reads objects directly from Azure Blob store or Azure Data Lake sources. You can -specified as a list of file URI strings or prefixes. The Azure input source is splittable and can be used by the [Parallel task](#parallel-task), where each worker task reads a single object. +specify objects as a list of file URI strings or prefixes. You can split the Azure input source for use with [Parallel task](#parallel-task) indexing and each worker task reads one chunk of the split data. Sample specs: From d3eb26414854d84bab8d229fbf3ae47eab6e90f8 Mon Sep 17 00:00:00 2001 From: Steve Hetland Date: Fri, 23 Apr 2021 11:22:52 -0700 Subject: [PATCH 5/7] Another reference added to bullet --- docs/ingestion/native-batch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ingestion/native-batch.md b/docs/ingestion/native-batch.md index 87ea95baeeb7..aec12e6601d2 100644 --- a/docs/ingestion/native-batch.md +++ b/docs/ingestion/native-batch.md @@ -58,7 +58,7 @@ The supported splittable input formats for now are: - [`s3`](#s3-input-source) reads data from AWS S3 storage. - [`gs`](#google-cloud-storage-input-source) reads data from Google Cloud Storage. -- [`azure`](#azure-input-source) reads data from Azure Blob Storage. +- [`azure`](#azure-input-source) reads data from Azure Blob Storage and Azure Data Lake. - [`hdfs`](#hdfs-input-source) reads data from HDFS storage. - [`http`](#http-input-source) reads data from HTTP servers. - [`local`](#local-input-source) reads data from local storage. From fcf76cab3fe7c68b8757b72f0c9e66cb5ff539d3 Mon Sep 17 00:00:00 2001 From: sthetland Date: Tue, 27 Apr 2021 16:44:22 -0700 Subject: [PATCH 6/7] Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> --- docs/ingestion/native-batch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ingestion/native-batch.md b/docs/ingestion/native-batch.md index aec12e6601d2..5341ee2e2143 100644 --- a/docs/ingestion/native-batch.md +++ b/docs/ingestion/native-batch.md @@ -1065,7 +1065,7 @@ Sample specs: |--------|-----------|-------|---------| |type|This should be `azure`.|None|yes| |uris|JSON array of URIs where the Azure objects to be ingested are located, in the form "azure://\/\"|None|`uris` or `prefixes` or `objects` must be set| -|prefixes|JSON array of URI prefixes for the locations of Azure objects to be ingested, in the form "azure://\/\". Empty objects starting with one of the given prefixes are skipped.|None|`uris` or `prefixes` or `objects` must be set| +|prefixes|JSON array of URI prefixes for the locations of Azure objects to ingest, in the form "azure://\/\". Empty objects starting with one of the given prefixes are skipped.|None|`uris` or `prefixes` or `objects` must be set| |objects|JSON array of Azure objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set| Note that the Azure input source skips all empty objects only when `prefixes` is specified. From c9af93cfb12753972e7ed66ac1ce78dff85db534 Mon Sep 17 00:00:00 2001 From: sthetland Date: Tue, 27 Apr 2021 16:44:29 -0700 Subject: [PATCH 7/7] Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> --- docs/ingestion/native-batch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ingestion/native-batch.md b/docs/ingestion/native-batch.md index 5341ee2e2143..257789b424a4 100644 --- a/docs/ingestion/native-batch.md +++ b/docs/ingestion/native-batch.md @@ -1066,7 +1066,7 @@ Sample specs: |type|This should be `azure`.|None|yes| |uris|JSON array of URIs where the Azure objects to be ingested are located, in the form "azure://\/\"|None|`uris` or `prefixes` or `objects` must be set| |prefixes|JSON array of URI prefixes for the locations of Azure objects to ingest, in the form "azure://\/\". Empty objects starting with one of the given prefixes are skipped.|None|`uris` or `prefixes` or `objects` must be set| -|objects|JSON array of Azure objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set| +|objects|JSON array of Azure objects to ingest.|None|`uris` or `prefixes` or `objects` must be set| Note that the Azure input source skips all empty objects only when `prefixes` is specified.