Skip to content

[EPIC] [DISCUSS] Comet timezone handling #2733

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Comet currently assumes that all native processing uses the UTC timezone. When reading from Parquet sources, Comet converts timestamps to UTC.

      String timeZoneId = conf.get("spark.sql.session.timeZone");
      // Native code uses "UTC" always as the timeZoneId when converting from spark to arrow schema.
      Schema arrowSchema = Utils$.MODULE$.toArrowSchema(sparkSchema, "UTC");
      byte[] serializedRequestedArrowSchema = serializeArrowSchema(arrowSchema);
      Schema dataArrowSchema = Utils$.MODULE$.toArrowSchema(dataSchema, "UTC");
      byte[] serializedDataArrowSchema = serializeArrowSchema(dataArrowSchema);

However, we are now seeing that this causes correctness issues or exceptions when the data source is not Parquet:

This epic is for reviewing and discussing Comet's approach to time zones.

Describe the potential solution

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions