Skip to content

Determine why deep copies are needed for native_iceberg_compat scan in some cases #2147

@andygrove

Description

@andygrove

What is the problem the feature request solves?

If I remove the CopyExecs that perform a deep copy, then the Spark SQL test filtering ratio policy fallback fails with memory corruption issues. It is using native_iceberg_compat scan, so the issue cannot be related to reuse of mutable buffers in the Parquet path. Presumably it is related to a use-after-free FFI issue.

I plan on debugging this to find the exact root cause.

Describe the potential solution

No response

Additional context

To reproduce, change can_reuse_input_batch to always return false.

fn can_reuse_input_batch(op: &Arc<dyn ExecutionPlan>) -> bool {
    false
}

In interactive mode:

sql/testOnly org.apache.spark.sql.DynamicPartitionPruningV1SuiteAEOff -- -z "filtering ratio policy fallback"

Test fails with:

[info]   == Results ==
[info]   !== Correct Answer - 4 ==   == Spark Answer - 0 ==
[info]    struct<>                   struct<>
[info]   ![1070,2,10,4]              
[info]   ![1080,3,20,4]              
[info]   ![1090,3,10,4]              
[info]   ![1100,3,10,4] (QueryTest.scala:244)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions