I was manually experimenting with some cast operations based on my experience of implementing them in Spark RAPIDS and found the following example of incorrect behavior. I would recommend implementing some fuzz tests to find these kind of issues.
Test data
scala> robots.show
+------+
| name|
+------+
|WALL-E|
| R2D2|
| T2|
+------+
Test with Comet
scala> import org.apache.spark.sql.types._
scala> val df = robots.withColumn("date", col("name").cast(DataTypes.TimestampType))
scala> df.show
+------+----+
| name|date|
+------+----+
|WALL-E|null|
| R2D2|null|
| T2|null|
+------+----+
Test with Spark
scala> spark.conf.set("spark.comet.enabled", false)
scala> df.show
+------+-------------------+
| name| date|
+------+-------------------+
|WALL-E| null|
| R2D2| null|
| T2|2024-02-09 02:00:00|
+------+-------------------+
T2 is a valid timestamp because T is the separator between the optional date and the time portion. 2 is a valid time because some time fields are optional.
I was manually experimenting with some cast operations based on my experience of implementing them in Spark RAPIDS and found the following example of incorrect behavior. I would recommend implementing some fuzz tests to find these kind of issues.
Test data
Test with Comet
Test with Spark
T2is a valid timestamp becauseTis the separator between the optional date and the time portion.2is a valid time because some time fields are optional.