Skip to content

Doc question: Can protobuf extension be used with "index_parallel"? #11003

@cswarth

Description

@cswarth

The Protobuf extension documentation demonstrates use of the extension to decode Kafka events.
Can the protobuf extension also be used to decode files, or it is only suitable for streaming input?

I tried make an example "index_parallel" task definition that uses protobuf but it gets rejected,

{"error":"Cannot construct instance of `org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexIngestionSpec`,
  problem: Cannot use parser and inputSource together. Try using inputFormat instead of parser.
 at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 77, column: 1] 
(through reference chain: org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask[\"spec\"])"
}

Task definition:

curl -v http://localhost:8888/druid/indexer/v1/task -H 'Content-Type: application/json' -d '
{
  "type": "index_parallel",
  "spec": {
    "ioConfig": {
      "type": "index_parallel",
      "inputSource": {
        "type": "local",
        "filter": "metrics.bin",
        "baseDir": "./"
      }
    },
    "tuningConfig": {
      "type": "index_parallel",
      "partitionsSpec": {
        "type": "dynamic"
      }
    },
    "dataSchema": {
      "dataSource": "metrics",
      "parser": {
        "type": "protobuf",
        "descriptor": "file:///tmp/metrics.desc",
        "protoMessageType": "Metrics",
        "parseSpec": {
          "format": "json",
          "timestampSpec": {
            "column": "timestamp",
            "format": "auto"
          },
          "dimensionsSpec": {
            "dimensions": [
              "unit",
              "http_method",
              "http_code",
              "page",
              "metricType",
              "server"
            ],
            "dimensionExclusions": [
              "timestamp",
              "value"
            ]
          }
        }
      },
      "metricsSpec": [
        {
          "name": "count",
          "type": "count"
        },
        {
          "name": "value_sum",
          "fieldName": "value",
          "type": "doubleSum"
        },
        {
          "name": "value_min",
          "fieldName": "value",
          "type": "doubleMin"
        },
        {
          "name": "value_max",
          "fieldName": "value",
          "type": "doubleMax"
        }
      ],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "HOUR",
        "queryGranularity": "NONE"
      }
  }
}
'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions