Skip to content

[iceberg] Memory corruption due to unsafe reuse of mutable buffers #2131

@andygrove

Description

@andygrove

Describe the bug

During testing of the Iceberg integration, we ran into memory corruption issues caused by unsafe reuse of mutable buffers.

In one example, the following plan has a SortExec, which caches batches, but the batches are coming from a ScanExec which reuses the underlying buffers. There is a CopyExec in the plan, but it uses the mode UnpackOrClone rather than UnpackOrDeepCopy.

SortExec: expr=[col_0@0 ASC, col_2@2 ASC], preserve_partitioning=[false]
  CopyExec [UnpackOrClone]
    FilterExec: col_0@0 IS NOT NULL
      ScanExec: source=[CometBatchScan testhadoop.default.table (unknown)], schema=[col_0: Int64, col_1: Int32, col_2: Timestamp(Microsecond, Some("UTC"))]

This issue is to track the work of reviewing this issue again and writing better internal documentation and improving testing.

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions