diff --git a/content/blog/2025-07-01-datafusion-comet-0.9.0.md b/content/blog/2025-07-01-datafusion-comet-0.9.0.md new file mode 100644 index 00000000..7fc35e69 --- /dev/null +++ b/content/blog/2025-07-01-datafusion-comet-0.9.0.md @@ -0,0 +1,177 @@ +--- +layout: post +title: Apache DataFusion Comet 0.9.0 Release +date: 2025-07-01 +author: pmc +categories: [subprojects] +--- + + + +The Apache DataFusion PMC is pleased to announce version 0.9.0 of the [Comet](https://datafusion.apache.org/comet/) subproject. + +Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes. + +This release covers approximately ten weeks of development work and is the result of merging 139 PRs from 24 +contributors. See the [change log] for more information. + +[change log]: https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.9.0.md + +## Release Highlights + +### Complex Type Support in Parquet Scans + +Comet now supports complex types (Structs, Maps, and Arrays) when reading Parquet files. This functionality is not +yet available when reading Parquet files from Apache Iceberg. + +This functionality was only available in previous releases when manually specifying one of the new experimental +scan implementations. Comet now automatically chooses the best scan implementation based on the input schema, and no +longer requires manual configuration. + +### Complex Type Processing Improvements + +Numerous improvements have been made to complex type support to ensure Spark-compatible behavior when casting between +structs and accessing fields within deeply nested types. + +### Shuffle Improvements + +Comet now accelerates a broader range of shuffle operations, leading to more queries running fully natively. In +previous releases, some shuffle operations fell back to Spark to avoid some known bugs in Comet, and these bugs have +now been fixed. + +### New Features + +Comet 0.9.0 adds support for the following Spark expressions: + +- ArrayDistinct +- ArrayMax +- ArrayRepeat +- ArrayUnion +- BitCount +- BitNot +- Expm1 +- MapValues +- Signum +- ToPrettyString +- map[] + +### Improved Spark SQL Test Coverage + +Comet now passes 97% of the Spark SQL test suite, with more than 24,000 tests passing (based on testing against +Spark 3.5.6). The remaining 3% of tests are ignored for various reasons, such as being too specific to Spark +internals, or testing for features that are not relevant to Comet, such as whole-stage code generation, which +is not needed when using a vectorized execution engine. + +This release contains numerous bug fixes to achieve this coverage, including improved support for exchange reuse +when AQE is enabled. + + + +
| Module | +Passed | +Ignored | +Canceled | +Total | +
|---|---|---|---|---|
| catalyst | 7,232 | 5 | 1 | 7,238 |
| core-1 | 9,186 | 246 | 6 | 9,438 |
| core-2 | 2,649 | 393 | 0 | 3,042 |
| core-3 | 1,757 | 136 | 16 | 1,909 |
| hive-1 | 2,174 | 14 | 4 | 2,192 |
| hive-2 | 19 | 1 | 4 | 24 |
| hive-3 | 1,058 | 11 | 4 | 1,073 |
| Total | 24,075 | 806 | 31 | 24,912 |