Skip to content

Consider removing ParquetFilters #36

@sunchao

Description

@sunchao

What is the problem the feature request solves?

The ParquetFilters were added originally so we could shade Parquet in Comet. However the shading was removed later as it caused a lot of trouble on the Iceberg side. In addition, we ported several Parquet classes from parquet-mr into Comet, so now the boundary between Comet and parquet-mr is fairly thin. Therefore, we could consider removing ParquetFilters and directly use the one from Spark.

The advantage of this is we are able to absorb changes & improvements in the newer version of Spark. For instance, apache/spark#36696 added Parquet In/NotIn pushdown from Spark side, which is only available since Spark 3.4. At the moment, as Comet keeps a copy of Spark's ParquetFilters, the feature is not added in order to be backward compatible with Spark 3.2 and 3.3.

Describe the potential solution

Evaluate whether we can remove ParquetFilters from Comet.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions