-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Labels
Milestone
Description
Apache Iceberg version
main (development)
Query engine
Other
Please describe the bug 🐞
With PyIceberg when we filter a complete DataFile, we end up with:
ArrowInvalid: Schema at index 1 was different:
vendor_id: int32
pickup_time: timestamp[us, tz=+00:00]
pickup_location_id: int32
dropoff_time: timestamp[us, tz=+00:00]
dropoff_location_id: int32
passenger_count: int32
trip_distance: double
ratecode_id: int32
payment_type: int32
total_amount: double
fare_amount: double
tip_amount: double
tolls_amount: double
mta_tax: double
improvement_surcharge: double
congestion_surcharge: double
extra_surcharges: double
store_and_forward_flag: string
vs
vendor_id: int32
pickup_time: timestamp[us, tz=UTC]
pickup_location_id: int32
dropoff_time: timestamp[us, tz=UTC]
dropoff_location_id: int32
passenger_count: int32
trip_distance: double
ratecode_id: int32
payment_type: int32
total_amount: double
fare_amount: double
tip_amount: double
tolls_amount: double
mta_tax: double
improvement_surcharge: double
congestion_surcharge: double
extra_surcharges: double
store_and_forward_flag: string
We get a +00:00 from the empty tables that we're concat'ing, and a UTC from the ones that actually contain data:
