Analysis of IgnoreCometNativeScan/IgnoreCometNativeDataFusion tests with native_datafusion (Spark 3.5.7)

# Native DataFusion Scan Test Analysis (Spark 3.5.7)

## Overview

This analysis covers tests that were previously ignored for `native_datafusion` scan mode
via `IgnoreCometNativeScan` or `IgnoreCometNativeDataFusion` tags in the Spark 3.5.7 diff.
Each test was run with `spark.comet.scan.impl=native_datafusion` to determine whether
the ignore directive is still necessary.

## Summary

- **Total tests with ignore directives removed:** 8 (across 3 test files)
- **Tests now passing:** 3 (ParquetEncryptionSuite)
- **Tests still failing:** 5 (ParquetV1FilterSuite: 4, ParquetV1QuerySuite: 1)
- **Diff updated:** Yes, removed `IgnoreCometNativeScan` from the 3 passing encryption tests

## Tests Now Passing (Ignore Removed)

### ParquetEncryptionSuite (`sql/hive`)

All three encryption tests now pass with `native_datafusion`:

| Test | Previous Ignore Reason |
|------|----------------------|
| `SPARK-34990: Write and read an encrypted parquet` | no encryption support yet |
| `SPARK-37117: Can't read files in Parquet encryption external key material mode` | no encryption support yet |
| `SPARK-42114: Test of uniform parquet encryption` | no encryption support yet |

These tests verify that Spark can write and read encrypted Parquet files. The native
DataFusion scan now handles encrypted Parquet correctly, so the ignore directives were
removed from the diff.

## Tests Still Failing (Ignore Retained)

### ParquetV1FilterSuite (`sql/core`)

All four tests fail only in the V1 source path (`ParquetV1FilterSuite`). The corresponding
V2 tests (`ParquetV2FilterSuite`) pass because V2 sources don't use Comet's native scan.

#### 1. `Filters should be pushed down for vectorized Parquet reader at row group level`

- **Ignore reason:** Native scans do not support the tested accumulator
- **Failure type:** `TestFailedException` (assertion failure)
- **Details:** The test checks that Parquet filter pushdown works at the row group level
  by examining a custom accumulator that counts row groups. The native DataFusion scan
  does not support Spark's accumulator mechanism for tracking pushed-down filter statistics,
  so the assertion on the accumulator value fails.

#### 2. `filter pushdown - StringPredicate`

- **Ignore reason:** cannot be pushed down
- **Failure type:** `TestFailedException` (assertion failure)
- **Details:** Tests that `StartsWith`, `EndsWith`, and `Contains` string predicates are
  pushed down into the Parquet reader. The native DataFusion scan does not push these
  string predicates down in the same way Spark's built-in reader does, causing the
  assertions on pushed filter counts to fail.

#### 3. `SPARK-17091: Convert IN predicate to Parquet filter push-down`

- **Ignore reason:** Comet has different push-down behavior
- **Failure type:** `CometRuntimeException: CometNativeExec should not be executed directly without a serialized plan`
- **Details:** The test constructs a DataFrame with specific filters and directly executes
  it in a way that triggers `CometNativeScan` without going through the proper native
  execution plan serialization. This is a fundamental incompatibility with how the native
  DataFusion scan handles standalone execution outside of a full native plan.

#### 4. `SPARK-34562: Bloom filter push down`

- **Ignore reason:** Native scans do not support the tested accumulator
- **Failure type:** `TestFailedException` (assertion failure)
- **Details:** Similar to test #1, this test relies on a custom accumulator to verify that
  bloom filter push-down is working. The native DataFusion scan does not integrate with
  Spark's accumulator framework for this purpose.

### ParquetV1QuerySuite (`sql/core`)

#### 5. `SPARK-26677: negated null-safe equality comparison should not filter matched row groups`

- **Ignore reason:** Native scans had the filter pushed into DF operator, cannot strip
- **Failure type:** `CometRuntimeException: CometNativeExec should not be executed directly without a serialized plan`
- **Details:** The test verifies that a negated null-safe equality filter (`NOT (value <=> 'A')`)
  does not incorrectly filter out row groups. With the native DataFusion scan, the filter
  gets pushed into the DataFusion operator rather than being handled at the Spark level.
  When the test tries to execute the scan directly, it hits the same serialization issue
  as SPARK-17091 above.

## Root Causes

The 5 still-failing tests fall into two categories:

1. **Accumulator incompatibility (tests #1, #2, #4):** The native DataFusion scan bypasses
   Spark's internal accumulator mechanism used to track filter pushdown statistics. Tests
   that assert on these accumulator values will fail.

2. **Direct execution without serialized plan (tests #3, #5):** The native DataFusion scan
   requires execution through a serialized native plan. When tests construct and execute
   scans directly (outside of the normal query planning flow), they hit a
   `CometRuntimeException` because `CometNativeScan` cannot be executed standalone.

## Note on V2 Tests

The `ParquetV2FilterSuite` and `ParquetV2QuerySuite` variants of these tests all pass
because they use `USE_V1_SOURCE_LIST = ""`, which means Spark uses V2 data sources
instead of V1. Comet's native scan only intercepts V1 Parquet sources, so V2 tests
effectively run without Comet's native scan and pass trivially.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analysis of IgnoreCometNativeScan/IgnoreCometNativeDataFusion tests with native_datafusion (Spark 3.5.7) #3305

Native DataFusion Scan Test Analysis (Spark 3.5.7)

Overview

Summary

Tests Now Passing (Ignore Removed)

ParquetEncryptionSuite (`sql/hive`)

Tests Still Failing (Ignore Retained)

ParquetV1FilterSuite (`sql/core`)

1. `Filters should be pushed down for vectorized Parquet reader at row group level`

2. `filter pushdown - StringPredicate`

3. `SPARK-17091: Convert IN predicate to Parquet filter push-down`

4. `SPARK-34562: Bloom filter push down`

ParquetV1QuerySuite (`sql/core`)

5. `SPARK-26677: negated null-safe equality comparison should not filter matched row groups`

Root Causes

Note on V2 Tests

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Test	Previous Ignore Reason
`SPARK-34990: Write and read an encrypted parquet`	no encryption support yet
`SPARK-37117: Can't read files in Parquet encryption external key material mode`	no encryption support yet
`SPARK-42114: Test of uniform parquet encryption`	no encryption support yet

Analysis of IgnoreCometNativeScan/IgnoreCometNativeDataFusion tests with native_datafusion (Spark 3.5.7) #3305

Description

Native DataFusion Scan Test Analysis (Spark 3.5.7)

Overview

Summary

Tests Now Passing (Ignore Removed)

ParquetEncryptionSuite (sql/hive)

Tests Still Failing (Ignore Retained)

ParquetV1FilterSuite (sql/core)

1. Filters should be pushed down for vectorized Parquet reader at row group level

2. filter pushdown - StringPredicate

3. SPARK-17091: Convert IN predicate to Parquet filter push-down

4. SPARK-34562: Bloom filter push down

ParquetV1QuerySuite (sql/core)

5. SPARK-26677: negated null-safe equality comparison should not filter matched row groups

Root Causes

Note on V2 Tests

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

ParquetEncryptionSuite (`sql/hive`)

ParquetV1FilterSuite (`sql/core`)

1. `Filters should be pushed down for vectorized Parquet reader at row group level`

2. `filter pushdown - StringPredicate`

3. `SPARK-17091: Convert IN predicate to Parquet filter push-down`

4. `SPARK-34562: Bloom filter push down`

ParquetV1QuerySuite (`sql/core`)

5. `SPARK-26677: negated null-safe equality comparison should not filter matched row groups`