This issue is created to analyze whether we can upgrade our Spark 3.x dependencies to 3.5.0.
Apache Pinot connects to Apache Spark in two different types of modules:
- batch ingestion plugins (modules
pinot-batch-ingestion-spark-3.2 and pinot-batch-ingestion-spark-2.4), used to ingest data from Spark.
- connectors (modules
pinot-spark-3-connector and pinot-spark-2-connector), used to use Spark to query Pinot.
Note that we need to support these two major Spark versions for compatibility reasons.
As explained in #11656, one of the reasons we cannot officially support Java 21 is that Spark 3.x do not support Java 21 until version 3.5. It seems there is no issues with Spark 2.x versions.
It is not clear to me exactly which Spark version we are using. As expected, pinot-batch-ingestion-spark-3.2 uses spark 3.2.x (specifically 3.2.1) while pinot-spark-3-connector uses 3.4.0. It also seems that Pinot distribution does not includes the connectors (which makes sense), and the final jars will contain the 3.2.1 version.
Spark versioning semantics are defined here. Notice that they do not follow semantic versioning. Specifically:
An API is any public class or interface exposed in Spark that is not marked as “developer API” or “experimental”. Release A is API compatible with release B if code compiled against release A compiles cleanly against B. Currently, does not guarantee that a compiled application that is linked against version A will link cleanly against version B without re-compiling. Link-level compatibility is something we’ll try to guarantee in future releases.
There is also a document where they specify the changes from version to version. We would need to understand the changes from 3.2 to 3.3, 3.3 to 3.4 and finally 3.4 to 3.5.
I've done some checks, modifying the dependency to 3.5.0 and:
I've opened #11702 to test a blind upgrade from current 3.2.4 to 3.5.0
This issue is created to analyze whether we can upgrade our Spark 3.x dependencies to 3.5.0.
Apache Pinot connects to Apache Spark in two different types of modules:
pinot-batch-ingestion-spark-3.2andpinot-batch-ingestion-spark-2.4), used to ingest data from Spark.pinot-spark-3-connectorandpinot-spark-2-connector), used to use Spark to query Pinot.Note that we need to support these two major Spark versions for compatibility reasons.
As explained in #11656, one of the reasons we cannot officially support Java 21 is that Spark 3.x do not support Java 21 until version 3.5. It seems there is no issues with Spark 2.x versions.
It is not clear to me exactly which Spark version we are using. As expected,
pinot-batch-ingestion-spark-3.2uses spark 3.2.x (specifically 3.2.1) whilepinot-spark-3-connectoruses 3.4.0. It also seems that Pinot distribution does not includes the connectors (which makes sense), and the final jars will contain the 3.2.1 version.Spark versioning semantics are defined here. Notice that they do not follow semantic versioning. Specifically:
There is also a document where they specify the changes from version to version. We would need to understand the changes from 3.2 to 3.3, 3.3 to 3.4 and finally 3.4 to 3.5.
I've done some checks, modifying the dependency to 3.5.0 and:
I've opened #11702 to test a blind upgrade from current 3.2.4 to 3.5.0