-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Labels
Milestone
Description
I discovered this bug with this query
> SELECT tpep_pickup_datetime FROM taxi LIMIT 1;
General("InvalidArgumentError(\"column types must match schema types, expected Timestamp(Microsecond, None) but found UInt64 at column index 0\")") The parquet reader detects this schema when reading from the file:
Schema {
fields: [
Field { name: "tpep_pickup_datetime", data_type: Timestamp(Microsecond, None), nullable: true, dict_id: 0, dict_is_ordered: false }
],
metadata: {}
} The struct array read from the file contains:
[PrimitiveArray<UInt64>
[
1567318008000000,
1567319357000000,
1567320092000000,
1567321151000000, When the Parquet arrow reader creates the record batch, the following validation logic fails:
for i in 0..columns.len() {
if columns[i].len() != len {
return Err(ArrowError::InvalidArgumentError(
"all columns in a record batch must have the same length".to_string(),
));
}
if columns[i].data_type() != schema.field(i).data_type() {
return Err(ArrowError::InvalidArgumentError(format!(
"column types must match schema types, expected {:?} but found {:?} at column index {}",
schema.field(i).data_type(),
columns[i].data_type(),
i)));
}
}
Reporter: Andy Grove / @andygrove
Assignee: Renjie Liu / @liurenjie1024
Related issues:
Note: This issue was originally created as ARROW-8258. Please see the migration documentation for further details.