diff --git a/content/blog/2025-07-01-datafusion-comet-0.9.0.md b/content/blog/2025-07-01-datafusion-comet-0.9.0.md new file mode 100644 index 00000000..7fc35e69 --- /dev/null +++ b/content/blog/2025-07-01-datafusion-comet-0.9.0.md @@ -0,0 +1,177 @@ +--- +layout: post +title: Apache DataFusion Comet 0.9.0 Release +date: 2025-07-01 +author: pmc +categories: [subprojects] +--- + + + +The Apache DataFusion PMC is pleased to announce version 0.9.0 of the [Comet](https://datafusion.apache.org/comet/) subproject. + +Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes. + +This release covers approximately ten weeks of development work and is the result of merging 139 PRs from 24 +contributors. See the [change log] for more information. + +[change log]: https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.9.0.md + +## Release Highlights + +### Complex Type Support in Parquet Scans + +Comet now supports complex types (Structs, Maps, and Arrays) when reading Parquet files. This functionality is not +yet available when reading Parquet files from Apache Iceberg. + +This functionality was only available in previous releases when manually specifying one of the new experimental +scan implementations. Comet now automatically chooses the best scan implementation based on the input schema, and no +longer requires manual configuration. + +### Complex Type Processing Improvements + +Numerous improvements have been made to complex type support to ensure Spark-compatible behavior when casting between +structs and accessing fields within deeply nested types. + +### Shuffle Improvements + +Comet now accelerates a broader range of shuffle operations, leading to more queries running fully natively. In +previous releases, some shuffle operations fell back to Spark to avoid some known bugs in Comet, and these bugs have +now been fixed. + +### New Features + +Comet 0.9.0 adds support for the following Spark expressions: + +- ArrayDistinct +- ArrayMax +- ArrayRepeat +- ArrayUnion +- BitCount +- BitNot +- Expm1 +- MapValues +- Signum +- ToPrettyString +- map[] + +### Improved Spark SQL Test Coverage + +Comet now passes 97% of the Spark SQL test suite, with more than 24,000 tests passing (based on testing against +Spark 3.5.6). The remaining 3% of tests are ignored for various reasons, such as being too specific to Spark +internals, or testing for features that are not relevant to Comet, such as whole-stage code generation, which +is not needed when using a vectorized execution engine. + +This release contains numerous bug fixes to achieve this coverage, including improved support for exchange reuse +when AQE is enabled. + + + + + + + + + + + + + + + + + + + + + + + +
ModulePassedIgnoredCanceledTotal
catalyst7,232517,238
core-19,18624669,438
core-22,64939303,042
core-31,757136161,909
hive-12,1741442,192
hive-2191424
hive-31,0581141,073
Total24,0758063124,912
+ +### Memory & Performance Tracing + +Comet now provides a tracing feature for analyzing performance and off-heap versus on-heap memory usage. See the +[Comet Tracing Guide] for more information. + +[Comet Tracing Guide]: https://datafusion.apache.org/comet/contributor-guide/tracing.html + + + +### Spark Compatibility + +- Spark 3.4.3 with JDK 11 & 17, Scala 2.12 & 2.13 +- Spark 3.5.4 through 3.5.6 with JDK 11 & 17, Scala 2.12 & 2.13 +- Experimental support for Spark 4.0.0 with JDK 17, Scala 2.13 + +We are looking for help from the community to fully support Spark 4.0.0. See [EPIC: Support 4.0.0] for more information. + +[EPIC: Support 4.0.0]: https://github.com/apache/datafusion-comet/issues/1637 + +Note that Java 8 support was removed from this release because Apache Arrow no longer supports it. + +## Getting Involved + +The Comet project welcomes new contributors. We use the same [Slack and Discord] channels as the main DataFusion +project and have a weekly [DataFusion video call]. + +[Slack and Discord]: https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord +[DataFusion video call]: https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing + +The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or +performance regressions that you find. See the [Getting Started] guide for instructions on downloading and installing +Comet. + +[Getting Started]: https://datafusion.apache.org/comet/user-guide/installation.html + +There are also many [good first issues] waiting for contributions. + +[good first issues]: https://github.com/apache/datafusion-comet/contribute diff --git a/content/images/comet-0.9.0/tracing.png b/content/images/comet-0.9.0/tracing.png new file mode 100644 index 00000000..3311ba55 Binary files /dev/null and b/content/images/comet-0.9.0/tracing.png differ