Skip to content

JSONPath length() function does not work in flattenSpec #11291

@bothra90

Description

@bothra90

Unable to ingest nested json data when trying to use flattenSpec with JSONPath length() function.

Description

The ingestion process fails with the following stacktrace:

2021-05-22T01:56:09,614 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Encountered exception in BUILD_SEGMENTS.
java.lang.ClassCastException: java.lang.Integer cannot be cast to com.fasterxml.jackson.databind.JsonNode
	at org.apache.druid.java.util.common.parsers.JSONFlattenerMaker.lambda$makeJsonPathExtractor$2(JSONFlattenerMaker.java:89) ~[druid-core-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.java.util.common.parsers.ObjectFlatteners$1$1.get(ObjectFlatteners.java:116) ~[druid-core-0.21.0-iap3.jar:0.21.0-iap3]
	at java.util.Collections$UnmodifiableMap.get(Collections.java:1456) ~[?:1.8.0_262]
	at org.apache.druid.data.input.MapBasedRow.getRaw(MapBasedRow.java:87) ~[druid-core-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.segment.incremental.IncrementalIndex.toIncrementalIndexRow(IncrementalIndex.java:544) ~[druid-processing-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.segment.incremental.IncrementalIndex.add(IncrementalIndex.java:480) ~[druid-processing-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.segment.realtime.plumber.Sink.add(Sink.java:179) ~[druid-server-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.add(AppenderatorImpl.java:261) ~[druid-server-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.append(BaseAppenderatorDriver.java:409) ~[druid-server-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.segment.realtime.appenderator.BatchAppenderatorDriver.add(BatchAppenderatorDriver.java:114) ~[druid-server-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.indexing.common.task.InputSourceProcessor.process(InputSourceProcessor.java:106) ~[druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:878) ~[druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.indexing.common.task.IndexTask.runTask(IndexTask.java:494) [druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:152) [druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runSequential(ParallelIndexSupervisorTask.java:964) [druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runTask(ParallelIndexSupervisorTask.java:445) [druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:152) [druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:451) [druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:423) [druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_262]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_262]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_262]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]

Input:

{
  ...
  "flattenSpec": {
    "fields": [
      {
        "type": "path",
        "name": "count",
        "expr": "$.team.players.length()"
      }
    ]
  }
  ...
}

Replacing json-path in flattenSpec with the following jackson-jq expression does not hit the same problem.

{
  ...
  "flattenSpec": {
    "fields": [
      {
        "type": "jq",
        "name": "count",
        "expr": ".team.players | length"
      }
    ]
  }
  ...
}

We want to use json-path instead of jq since it's applicable to non-JSON files as well.

Affected Version

Imply version 2021.01-2 LTS

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions