Skip to content

Ingestion using systemFields results in exception #16709

@petermarshallio

Description

@petermarshallio

Running the following SQL-based ingestion, which includes systemFields, returns an exception org.apache.druid.segment.transform.TransformedInputRow and org.apache.druid.data.input.ListBasedInputRow are in unnamed module of loader 'app'

Props to @317brian for originally finding this issue.

Affected Version

Druid 30.

Description

Run the following SQL in Druid 30:

REPLACE INTO "example-taxitrips-systemfields" OVERWRITE ALL
WITH "ext" AS (
  SELECT *
  FROM TABLE(EXTERN('{
            "type":"http",
            "systemFields":["__file_uri","__file_path"],
            "uris":
                ["https://static.imply.io/example-data/trips/trips_xaa.csv.gz",
                "https://static.imply.io/example-data/trips/trips_xac.csv.gz"]}', '{"type":"csv","findColumnsFromHeader":false,"columns":["trip_id","vendor_id","pickup_datetime","dropoff_datetime","store_and_fwd_flag","rate_code_id","pickup_longitude","pickup_latitude","dropoff_longitude","dropoff_latitude","passenger_count","trip_distance","fare_amount","extra","mta_tax","tip_amount","tolls_amount","ehail_fee","improvement_surcharge","total_amount","payment_type","trip_type","pickup","dropoff","cab_type","precipitation","snow_depth","snowfall","max_temperature","min_temperature","average_wind_speed","pickup_nyct2010_gid","pickup_ctlabel","pickup_borocode","pickup_boroname","pickup_ct2010","pickup_boroct2010","pickup_cdeligibil","pickup_ntacode","pickup_ntaname","pickup_puma","dropoff_nyct2010_gid","dropoff_ctlabel","dropoff_borocode","dropoff_boroname","dropoff_ct2010","dropoff_boroct2010","dropoff_cdeligibil","dropoff_ntacode","dropoff_ntaname","dropoff_puma"]}')) EXTEND ("__file_uri" VARCHAR, "__file_path" VARCHAR,
      "trip_id" BIGINT, "vendor_id" BIGINT, "pickup_datetime" VARCHAR, "dropoff_datetime" VARCHAR, "store_and_fwd_flag" VARCHAR, "rate_code_id" BIGINT, "pickup_longitude" DOUBLE, "pickup_latitude" DOUBLE, "dropoff_longitude" DOUBLE, "dropoff_latitude" DOUBLE, "passenger_count" BIGINT, "trip_distance" DOUBLE, "fare_amount" DOUBLE, "extra" DOUBLE, "mta_tax" DOUBLE, "tip_amount" DOUBLE, "tolls_amount" DOUBLE, "ehail_fee" VARCHAR, "improvement_surcharge" VARCHAR, "total_amount" DOUBLE, "payment_type" BIGINT, "trip_type" VARCHAR, "pickup" VARCHAR, "dropoff" VARCHAR, "cab_type" VARCHAR, "precipitation" DOUBLE, "snow_depth" BIGINT, "snowfall" DOUBLE, "max_temperature" BIGINT, "min_temperature" BIGINT, "average_wind_speed" DOUBLE, "pickup_nyct2010_gid" BIGINT, "pickup_ctlabel" BIGINT, "pickup_borocode" BIGINT, "pickup_boroname" VARCHAR, "pickup_ct2010" BIGINT, "pickup_boroct2010" BIGINT, "pickup_cdeligibil" VARCHAR, "pickup_ntacode" VARCHAR, "pickup_ntaname" VARCHAR, "pickup_puma" BIGINT, "dropoff_nyct2010_gid" BIGINT, "dropoff_ctlabel" BIGINT, "dropoff_borocode" BIGINT, "dropoff_boroname" VARCHAR, "dropoff_ct2010" BIGINT, "dropoff_boroct2010" BIGINT, "dropoff_cdeligibil" VARCHAR, "dropoff_ntacode" VARCHAR, "dropoff_ntaname" VARCHAR, "dropoff_puma" BIGINT)
)
SELECT
  TIME_PARSE(TRIM("pickup_datetime")) AS "__time",
  "__file_uri",
  "__file_path",
  "trip_id",
  "vendor_id",
  "dropoff_datetime",
  "rate_code_id",
  "passenger_count",
  "trip_distance",
  "fare_amount",
  "extra",
  "mta_tax",
  "tip_amount",
  "tolls_amount",
  "total_amount",
  "payment_type"
FROM "ext"
WHERE "passenger_count" = 5
PARTITIONED BY DAY

The following exception is thrown:

java.lang.RuntimeException: java.lang.ClassCastException: class org.apache.druid.segment.transform.TransformedInputRow cannot be cast to class org.apache.druid.data.input.ListBasedInputRow (org.apache.druid.segment.transform.TransformedInputRow and org.apache.druid.data.input.ListBasedInputRow are in unnamed module of loader 'app')
	at org.apache.druid.java.util.common.Either.valueOrThrow(Either.java:95)
	at org.apache.druid.frame.processor.FrameProcessorExecutor$1ExecutorRunnable.runProcessorNow(FrameProcessorExecutor.java:259)
	at org.apache.druid.frame.processor.FrameProcessorExecutor$1ExecutorRunnable.run(FrameProcessorExecutor.java:138)
	at org.apache.druid.msq.exec.WorkerImpl$1$2.run(WorkerImpl.java:836)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at org.apache.druid.query.PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:259)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.ClassCastException: class org.apache.druid.segment.transform.TransformedInputRow cannot be cast to class org.apache.druid.data.input.ListBasedInputRow (org.apache.druid.segment.transform.TransformedInputRow and org.apache.druid.data.input.ListBasedInputRow are in unnamed module of loader 'app')
	at org.apache.druid.data.input.ListBasedInputRowAdapter.lambda$columnFunction$1(ListBasedInputRowAdapter.java:57)
	at org.apache.druid.segment.RowBasedColumnSelectorFactory$3.updateCurrentValue(RowBasedColumnSelectorFactory.java:515)
	at org.apache.druid.segment.RowBasedColumnSelectorFactory$3.updateCurrentValueAsNumber(RowBasedColumnSelectorFactory.java:530)
	at org.apache.druid.segment.RowBasedColumnSelectorFactory$3.isNull(RowBasedColumnSelectorFactory.java:463)
	at org.apache.druid.segment.filter.ValueMatchers$3.matches(ValueMatchers.java:244)
	at org.apache.druid.segment.RowBasedCursor.advanceToMatchingRow(RowBasedCursor.java:138)
	at org.apache.druid.segment.RowBasedCursor.<init>(RowBasedCursor.java:85)
	at org.apache.druid.segment.RowBasedStorageAdapter.lambda$makeCursors$0(RowBasedStorageAdapter.java:205)
	at com.google.common.collect.Iterators$6.transform(Iterators.java:828)
	at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:52)
	at org.apache.druid.java.util.common.guava.BaseSequence.toYielder(BaseSequence.java:71)
	at org.apache.druid.java.util.common.guava.WrappingSequence$2.get(WrappingSequence.java:88)
	at org.apache.druid.java.util.common.guava.WrappingSequence$2.get(WrappingSequence.java:84)
	at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55)
	at org.apache.druid.java.util.common.guava.WrappingSequence.toYielder(WrappingSequence.java:83)
	at org.apache.druid.java.util.common.guava.Yielders.each(Yielders.java:32)
	at org.apache.druid.msq.querykit.scan.ScanQueryFrameProcessor.runWithSegment(ScanQueryFrameProcessor.java:261)
	at org.apache.druid.msq.querykit.BaseLeafFrameProcessor.runIncrementally(BaseLeafFrameProcessor.java:88)
	at org.apache.druid.msq.querykit.scan.ScanQueryFrameProcessor.runIncrementally(ScanQueryFrameProcessor.java:163)
	at org.apache.druid.frame.processor.FrameProcessors$1FrameProcessorWithBaggage.runIncrementally(FrameProcessors.java:75)
	at org.apache.druid.frame.processor.FrameProcessorExecutor$1ExecutorRunnable.runProcessorNow(FrameProcessorExecutor.java:230)
	... 8 more

Druid 29 succeeds without error.

Initial check by @clintropolis suggests it may relate to #15681.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions