Skip to content

debug_assert_eq! in BatchCoalescer panics in debug mode when batch_size < 4 #9506

@Tim-53

Description

@Tim-53

Describe the bug
debug_assert_eq!(self.current.capacity(), self.batch_size) in InProgressPrimitiveArray::ensure_capacity (and same in InProgressByteViewArray) panics in debug mode. Vec::reserve(n) does not guarantee exact capacity, Rust's MIN_NON_ZERO_CAP optimization means for most numeric types, reserve(2) gives capacity = 4, not 2.

To Reproduce

  use arrow::compute::BatchCoalescer;                                                           
  use arrow::datatypes::{DataType, Field, Schema};                                              
  use arrow::array::{Int32Array, RecordBatch};                                             
  use std::sync::Arc;                                                                           
                                                                                              
  let schema = Arc::new(Schema::new(vec![Field::new("a", DataType::Int32, false)]));            
  let mut coalescer = BatchCoalescer::new(schema.clone(), 2);                                   
  let batch = RecordBatch::try_new(
      schema,
      vec![Arc::new(Int32Array::from(vec![1, 2, 3]))],
  ).unwrap();
  coalescer.push_batch(batch).unwrap(); // panics in debug mode

Expected behavior
No panic. The assertion should account for the fact that Vec::reserve guarantees at least the requested capacity, not exactly.

Additional context
Affected files:

  • arrow-select/src/coalesce/primitive.rs:61
  • arrow-select/src/coalesce/byte_view.rs:104

First observed via Apache DataFusion sqllogictests: apache/datafusion#20689

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow cratebug

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions