What is the problem the feature request solves?
Comet currently assumes that all native processing uses the UTC timezone. When reading from Parquet sources, Comet converts timestamps to UTC.
String timeZoneId = conf.get("spark.sql.session.timeZone");
// Native code uses "UTC" always as the timeZoneId when converting from spark to arrow schema.
Schema arrowSchema = Utils$.MODULE$.toArrowSchema(sparkSchema, "UTC");
byte[] serializedRequestedArrowSchema = serializeArrowSchema(arrowSchema);
Schema dataArrowSchema = Utils$.MODULE$.toArrowSchema(dataSchema, "UTC");
byte[] serializedDataArrowSchema = serializeArrowSchema(dataArrowSchema);
However, we are now seeing that this causes correctness issues or exceptions when the data source is not Parquet:
This epic is for reviewing and discussing Comet's approach to time zones.
Describe the potential solution
No response
Additional context
No response
What is the problem the feature request solves?
Comet currently assumes that all native processing uses the UTC timezone. When reading from Parquet sources, Comet converts timestamps to UTC.
However, we are now seeing that this causes correctness issues or exceptions when the data source is not Parquet:
This epic is for reviewing and discussing Comet's approach to time zones.
Describe the potential solution
No response
Additional context
No response