I tried make an example "index_parallel" task definition that uses protobuf but it gets rejected,
{"error":"Cannot construct instance of `org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexIngestionSpec`,
problem: Cannot use parser and inputSource together. Try using inputFormat instead of parser.
at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 77, column: 1]
(through reference chain: org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask[\"spec\"])"
}
curl -v http://localhost:8888/druid/indexer/v1/task -H 'Content-Type: application/json' -d '
{
"type": "index_parallel",
"spec": {
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "local",
"filter": "metrics.bin",
"baseDir": "./"
}
},
"tuningConfig": {
"type": "index_parallel",
"partitionsSpec": {
"type": "dynamic"
}
},
"dataSchema": {
"dataSource": "metrics",
"parser": {
"type": "protobuf",
"descriptor": "file:///tmp/metrics.desc",
"protoMessageType": "Metrics",
"parseSpec": {
"format": "json",
"timestampSpec": {
"column": "timestamp",
"format": "auto"
},
"dimensionsSpec": {
"dimensions": [
"unit",
"http_method",
"http_code",
"page",
"metricType",
"server"
],
"dimensionExclusions": [
"timestamp",
"value"
]
}
}
},
"metricsSpec": [
{
"name": "count",
"type": "count"
},
{
"name": "value_sum",
"fieldName": "value",
"type": "doubleSum"
},
{
"name": "value_min",
"fieldName": "value",
"type": "doubleMin"
},
{
"name": "value_max",
"fieldName": "value",
"type": "doubleMax"
}
],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": "NONE"
}
}
}
'
The Protobuf extension documentation demonstrates use of the extension to decode Kafka events.
Can the protobuf extension also be used to decode files, or it is only suitable for streaming input?
I tried make an example "index_parallel" task definition that uses protobuf but it gets rejected,
Task definition: