Skip to content

druid-orc-extensions hadoop-common dependency is broken #7438

@egor-ryashin

Description

@egor-ryashin

druid-orc-extensions hadoop-common dependency is broken or maybe the extension isn't properly documented

Affected Version

0.13.0-incubating

Description

Using this modules:
druid.extensions.loadList=["mysql-metadata-storage", "druid-kafka-indexing-service", "druid-orc-extensions", "druid-hdfs-storage"]

du extensions/*
20032	extensions/druid-cassandra-storage
46928	extensions/druid-hdfs-storage
4240	extensions/druid-kafka-indexing-service
168	extensions/druid-lookups-cached-global
56640	extensions/druid-orc-extensions
136	extensions/druid-s3-extensions
1968	extensions/mysql-metadata-storage

Posting this task:

{
  "type": "index_parallel",
  "spec": {
    "dataSchema": {
      "dataSource": "my_orc_test",
      "metricsSpec": [
        {
          "type": "count",
              "name": "count"
            }
      ],
      "granularitySpec": {
          "segmentGranularity": "DAY",
          "queryGranularity": "second",
          "intervals" : [ "2018-07-10/2018-07-11" ]
       },
        "parser": {
          "type": "orc",
          "parseSpec": {
            "format": "timeAndDims",
            "timestampSpec": {
              "column": "time",
              "format": "auto"
            },
            "dimensionsSpec": {
              "dimensions": [
                "tag"
              ],
              "dimensionExclusions": [],
              "spatialDimensions": []
            }
          },
          "typeString": "struct<time:string,tag:string>",
          "mapFieldNameFormat": "<PARENT>_<CHILD>"
        }
            
    },
    "ioConfig": {
        "type": "index_parallel",
        "firehose": {
          "type": "local",
          "baseDir": "./",
          "filter": "*.orc"
        }
    }
  }
}

Got error from the spawned subtask:

2019-04-10T22:16:10,413 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Uncaught Throwable while running task[AbstractTask{id='index_sub_my_orc_test_2019-04-10T22:16:02.661Z', groupId='index_parallel_my_orc_test_2019-04-10T22:15:55.541Z', taskResource=TaskResource{availabilityGroup='index_sub_my_orc_test_2019-04-10T22:16:02.661Z', requiredCapacity=1}, dataSource='my_orc_test', context={}}]
java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable

The log also says the needed dependency jar is loaded beforehand:

2019-04-10T22:16:04,921 INFO [main] org.apache.druid.initialization.Initialization - added URL[file:/Users/egorryashin/a-druid-0.13-i/extensions/druid-hdfs-storage/hadoop-common-2.8.3.jar] for extension[druid-hdfs-storage]

The task doesn't work neither with druid-hdfs-storage loaded nor without it.

I spotted that while I was investigating #6925

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions