Skip to content

Inputs to SortMergeJoin should be wrapped in a CopyExec with UnpackOrDeepCopy if the inputs can reuse batches #2151

@andygrove

Description

@andygrove

Describe the bug

In planner.rs, we wrap inputs to HashJoinExec in a CopyExec, but we do not do the same for SortMergeExec. According to DataFusion's documentation, SortMergeExec can cache input batches, so this seems incorrect.

The following error was a factor in identifying this issue:

[info]   Cause: org.apache.comet.CometNativeException: overflow
[info]         at comet::errors::init::{{closure}}(__internal__:0)
[info]         at std::panicking::rust_panic_with_hook(__internal__:0)
[info]         at std::panicking::begin_panic_handler::{{closure}}(__internal__:0)
[info]         at std::sys::backtrace::__rust_end_short_backtrace(__internal__:0)
[info]         at __rustc::rust_begin_unwind(__internal__:0)
[info]         at core::panicking::panic_fmt(__internal__:0)
[info]         at core::option::expect_failed(__internal__:0)
[info]         at arrow_select::take::take_bytes(__internal__:0)
[info]         at arrow_select::take::take_impl(__internal__:0)
[info]         at arrow_select::take::take(__internal__:0)
[info]         at datafusion_physical_plan::joins::sort_merge_join::SortMergeJoinStream::freeze_streamed(__internal__:0)
[info]         at datafusion_physical_plan::joins::sort_merge_join::SortMergeJoinStream::poll_buffered_batches(__internal__:0)
[info]         at <datafusion_physical_plan::joins::sort_merge_join::SortMergeJoinStream as futures_core::stream::Stream>::poll_next(__internal__:0)
[info]         at comet::execution::jni_api::Java_org_apache_comet_Native_executePlan::{{closure}}::{{closure}}(__internal__:0)
[info]         at Java_org_apache_comet_Native_executePlan(__internal__:0)
[info]         at <unknown>(__internal__:0)

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions