Skip to content

Cast string to timestamp not compatible with Spark logic #14

@andygrove

Description

@andygrove

I was manually experimenting with some cast operations based on my experience of implementing them in Spark RAPIDS and found the following example of incorrect behavior. I would recommend implementing some fuzz tests to find these kind of issues.

Test data

scala> robots.show
+------+
|  name|
+------+
|WALL-E|
|  R2D2|
|    T2|
+------+

Test with Comet

scala> import org.apache.spark.sql.types._

scala> val df = robots.withColumn("date", col("name").cast(DataTypes.TimestampType))

scala> df.show
+------+----+
|  name|date|
+------+----+
|WALL-E|null|
|  R2D2|null|
|    T2|null|
+------+----+

Test with Spark

scala> spark.conf.set("spark.comet.enabled", false)

scala> df.show
+------+-------------------+
|  name|               date|
+------+-------------------+
|WALL-E|               null|
|  R2D2|               null|
|    T2|2024-02-09 02:00:00|
+------+-------------------+

T2 is a valid timestamp because T is the separator between the optional date and the time portion. 2 is a valid time because some time fields are optional.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions