Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/content/ingestion/firehose.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,8 @@ The below configurations can be optionally used for tuning the firehose performa
### IngestSegmentFirehose

This Firehose can be used to read the data from existing druid segments.
It can be used ingest existing druid segments using a new schema and change the name, dimensions, metrics, rollup, etc. of the segment.
It can be used to ingest existing druid segments using a new schema and change the name, dimensions, metrics, rollup, etc. of the segment.
This firehose is _splittable_ and can be used by [native parallel index tasks](./native_tasks.html#parallel-index-task).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sample ingest firehose spec is shown below -

```json
Expand All @@ -106,6 +107,7 @@ A sample ingest firehose spec is shown below -
|dimensions|The list of dimensions to select. If left empty, no dimensions are returned. If left null or not defined, all dimensions are returned. |no|
|metrics|The list of metrics to select. If left empty, no metrics are returned. If left null or not defined, all metrics are selected.|no|
|filter| See [Filters](../querying/filters.html)|no|
|maxInputSegmentBytesPerTask|When used with the native parallel index task, the maximum number of bytes of input segments to process in a single task. If a single segment is larger than this number, it will be processed by itself in a single task (input segments are never split across tasks). Defaults to 150MB.|no|

#### SqlFirehose

Expand Down
2 changes: 1 addition & 1 deletion docs/content/ingestion/native_tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ task statuses. If one of them fails, it retries the failed task until the retryi
If all worker tasks succeed, then it collects the reported list of generated segments and publishes those segments at once.

To use this task, the `firehose` in `ioConfig` should be _splittable_. If it's not, this task runs sequentially. The
current splittable fireshoses are [`LocalFirehose`](./firehose.html#localfirehose), [`HttpFirehose`](./firehose.html#httpfirehose)
current splittable fireshoses are [`LocalFirehose`](./firehose.html#localfirehose), [`IngestSegmentFirehose`](./firehose.html#ingestsegmentfirehose), [`HttpFirehose`](./firehose.html#httpfirehose)
, [`StaticS3Firehose`](../development/extensions-core/s3.html#statics3firehose), [`StaticAzureBlobStoreFirehose`](../development/extensions-contrib/azure.html#staticazureblobstorefirehose)
, [`StaticGoogleBlobStoreFirehose`](../development/extensions-contrib/google.html#staticgoogleblobstorefirehose), and [`StaticCloudFilesFirehose`](../development/extensions-contrib/cloudfiles.html#staticcloudfilesfirehose).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -506,10 +506,12 @@ private static IndexIOConfig createIoConfig(
new IngestSegmentFirehoseFactory(
dataSchema.getDataSource(),
interval,
null,
null, // no filter
// set dimensions and metrics names to make sure that the generated dataSchema is used for the firehose
dataSchema.getParser().getParseSpec().getDimensionsSpec().getDimensionNames(),
Arrays.stream(dataSchema.getAggregators()).map(AggregatorFactory::getName).collect(Collectors.toList()),
null,
toolbox.getIndexIO(),
coordinatorClient,
segmentLoaderFactory,
Expand Down
Loading