I am attempting to combing two datasources using the CombiningFirehoseFactory
For reference, I am following the update data tutorial
My ingestion spec looks like this
{ "type" : "index", "spec" : { "dataSchema" : { "dataSource" : "updates-tutorial", "parser" : { "type" : "string", "parseSpec" : { "format" : "json", "dimensionsSpec" : { "dimensions" : [ "animal" ] }, "timestampSpec": { "column": "timestamp", "format": "iso" } } }, "metricsSpec" : [ { "type" : "count", "name" : "count" }, { "type" : "longSum", "name" : "number", "fieldName" : "number" } ], "granularitySpec" : { "type" : "uniform", "segmentGranularity" : "hour", "queryGranularity" : "minute", "intervals" : ["2018-01-01/2018-01-03"], "rollup" : false } }, "ioConfig" : { "type" : "index", "firehose" : { "type": "combining", "delegates": [ { "type" : "ingestSegment", "dataSource" : "s1", "interval" : "2018-01-01/2018-01-03" }, { "type" : "ingestSegment", "dataSource" : "s2", "interval" : "2018-01-01/2018-01-03" } ] }, "appendToExisting" : false }, "tuningConfig" : { "type" : "index", "maxRowsPerSegment" : 5000000, "maxRowsInMemory" : 25000 } } }
After reading into the code a bit, I believe that CombiningFirehoseFactory allows you to combine multiple datasources
But when i submit the task, i get the following error
2020-02-21T11:35:39,222 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Encountered exception in NOT_STARTED. java.lang.ClassCastException: org.apache.druid.segment.realtime.firehose.CombiningFirehoseFactory cannot be cast to org.apache.druid.data.input.FiniteFirehoseFactory at org.apache.druid.indexing.common.task.IndexTask$IndexIOConfig.getNonNullInputSource(IndexTask.java:1148) ~[druid-indexing-service-0.17.0.jar:0.17.0] at org.apache.druid.indexing.common.task.IndexTask.runTask(IndexTask.java:477) [druid-indexing-service-0.17.0.jar:0.17.0] at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:138) [druid-indexing-service-0.17.0.jar:0.17.0] at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:419) [druid-indexing-service-0.17.0.jar:0.17.0] at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:391) [druid-indexing-service-0.17.0.jar:0.17.0] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_232] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
I am not sure, if this is a issue with the datasource of my selection or config issue.
In the interim, i have tried to alternate the datasource from ingestSegment to local as well, but i continue to get the same error.
Would really appreciate if i could get any help on this or if someone can help rectify my understanding on the matter.
I am attempting to combing two datasources using the CombiningFirehoseFactory
For reference, I am following the update data tutorial
My ingestion spec looks like this
{ "type" : "index", "spec" : { "dataSchema" : { "dataSource" : "updates-tutorial", "parser" : { "type" : "string", "parseSpec" : { "format" : "json", "dimensionsSpec" : { "dimensions" : [ "animal" ] }, "timestampSpec": { "column": "timestamp", "format": "iso" } } }, "metricsSpec" : [ { "type" : "count", "name" : "count" }, { "type" : "longSum", "name" : "number", "fieldName" : "number" } ], "granularitySpec" : { "type" : "uniform", "segmentGranularity" : "hour", "queryGranularity" : "minute", "intervals" : ["2018-01-01/2018-01-03"], "rollup" : false } }, "ioConfig" : { "type" : "index", "firehose" : { "type": "combining", "delegates": [ { "type" : "ingestSegment", "dataSource" : "s1", "interval" : "2018-01-01/2018-01-03" }, { "type" : "ingestSegment", "dataSource" : "s2", "interval" : "2018-01-01/2018-01-03" } ] }, "appendToExisting" : false }, "tuningConfig" : { "type" : "index", "maxRowsPerSegment" : 5000000, "maxRowsInMemory" : 25000 } } }After reading into the code a bit, I believe that CombiningFirehoseFactory allows you to combine multiple datasources
But when i submit the task, i get the following error
2020-02-21T11:35:39,222 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Encountered exception in NOT_STARTED. java.lang.ClassCastException: org.apache.druid.segment.realtime.firehose.CombiningFirehoseFactory cannot be cast to org.apache.druid.data.input.FiniteFirehoseFactory at org.apache.druid.indexing.common.task.IndexTask$IndexIOConfig.getNonNullInputSource(IndexTask.java:1148) ~[druid-indexing-service-0.17.0.jar:0.17.0] at org.apache.druid.indexing.common.task.IndexTask.runTask(IndexTask.java:477) [druid-indexing-service-0.17.0.jar:0.17.0] at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:138) [druid-indexing-service-0.17.0.jar:0.17.0] at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:419) [druid-indexing-service-0.17.0.jar:0.17.0] at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:391) [druid-indexing-service-0.17.0.jar:0.17.0] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_232] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]I am not sure, if this is a issue with the datasource of my selection or config issue.
In the interim, i have tried to alternate the datasource from
ingestSegmenttolocalas well, but i continue to get the same error.Would really appreciate if i could get any help on this or if someone can help rectify my understanding on the matter.