Skip to content

Python: Inconsistency around timezones #6945

@Fokko

Description

@Fokko

Apache Iceberg version

main (development)

Query engine

Other

Please describe the bug 🐞

With PyIceberg when we filter a complete DataFile, we end up with:

ArrowInvalid: Schema at index 1 was different: 
vendor_id: int32
pickup_time: timestamp[us, tz=+00:00]
pickup_location_id: int32
dropoff_time: timestamp[us, tz=+00:00]
dropoff_location_id: int32
passenger_count: int32
trip_distance: double
ratecode_id: int32
payment_type: int32
total_amount: double
fare_amount: double
tip_amount: double
tolls_amount: double
mta_tax: double
improvement_surcharge: double
congestion_surcharge: double
extra_surcharges: double
store_and_forward_flag: string
vs
vendor_id: int32
pickup_time: timestamp[us, tz=UTC]
pickup_location_id: int32
dropoff_time: timestamp[us, tz=UTC]
dropoff_location_id: int32
passenger_count: int32
trip_distance: double
ratecode_id: int32
payment_type: int32
total_amount: double
fare_amount: double
tip_amount: double
tolls_amount: double
mta_tax: double
improvement_surcharge: double
congestion_surcharge: double
extra_surcharges: double
store_and_forward_flag: string

We get a +00:00 from the empty tables that we're concat'ing, and a UTC from the ones that actually contain data:
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions