Skip to content

ScalarValue::to_array_of_size panics computing statistics for nested parquet file #2653

@tustvold

Description

@tustvold

Describe the bug

let ctx = SessionContext::new();

let mut options = ParquetReadOptions::default()
    .parquet_pruning(true)
    .to_listing_options(2);

// Disable stats collection
options.collect_stat = true;

ctx.register_listing_table("patient", "/home/raphael/Downloads/part-00000-f6337bce-7fcd-4021-9f9d-040413ea83f8-c000.snappy.parquet", options, None).await.unwrap();

let df = ctx.sql("SELECT patient.meta FROM patient LIMIT 10").await.unwrap();
df.show().await.unwrap();

Where part-00000-f6337bce-7fcd-4021-9f9d-040413ea83f8-c000.snappy.parquet is the parquet file provided by @kesavkolla in #2439

Panics with

called `Result::unwrap()` on an `Err` value: ArrowError(ComputeError("concat requires input of at least one array"))
thread 'physical_plan::file_format::parquet::tests::temp' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError(ComputeError("concat requires input of at least one array"))', datafusion/common/src/scalar.rs:1206:18
stack backtrace:
   0: rust_begin_unwind
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:143:14
   2: core::result::unwrap_failed
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/result.rs:1785:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/result.rs:1078:23
   4: datafusion_common::scalar::ScalarValue::to_array_of_size
             at /home/raphael/repos/external/arrow-datafusion/datafusion/common/src/scalar.rs:1198:22
   5: datafusion_common::scalar::ScalarValue::to_array_of_size::{{closure}}
             at /home/raphael/repos/external/arrow-datafusion/datafusion/common/src/scalar.rs:1253:45
   6: core::iter::adapters::map::map_fold::{{closure}}
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/adapters/map.rs:84:28
   7: core::iter::traits::iterator::Iterator::fold
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/traits/iterator.rs:2362:21
   8: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/adapters/map.rs:124:9
   9: core::iter::traits::iterator::Iterator::for_each
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/traits/iterator.rs:779:9
  10: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/spec_extend.rs:40:17
  11: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/spec_from_iter_nested.rs:62:9
  12: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/spec_from_iter.rs:33:9
  13: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/vec/mod.rs:2554:9
  14: core::iter::traits::iterator::Iterator::collect
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/iter/traits/iterator.rs:1784:9
  15: datafusion_common::scalar::ScalarValue::to_array_of_size
             at /home/raphael/repos/external/arrow-datafusion/datafusion/common/src/scalar.rs:1248:48
  16: datafusion_common::scalar::ScalarValue::to_array
             at /home/raphael/repos/external/arrow-datafusion/datafusion/common/src/scalar.rs:658:9
  17: datafusion::datasource::get_statistics_with_limit::{{closure}}
             at ./src/datasource/mod.rs:75:56
  18: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  19: datafusion::datasource::listing::table::ListingTable::list_files_for_scan::{{closure}}
             at ./src/datasource/listing/table.rs:394:67
  20: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  21: <datafusion::datasource::listing::table::ListingTable as datafusion::datasource::datasource::TableProvider>::scan::{{closure}}
             at ./src/datasource/listing/table.rs:310:53
  22: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  23: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/future.rs:124:9
  24: datafusion::physical_plan::planner::DefaultPhysicalPlanner::create_initial_plan::{{closure}}
             at ./src/physical_plan/planner.rs:392:64
  25: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  26: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/future.rs:124:9
  27: datafusion::physical_plan::planner::DefaultPhysicalPlanner::create_initial_plan::{{closure}}
             at ./src/physical_plan/planner.rs:623:84
  28: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  29: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/future.rs:124:9
  30: datafusion::physical_plan::planner::DefaultPhysicalPlanner::create_initial_plan::{{closure}}

Setting options.collect_stat = false eliminates the panic

Expected behavior

The above should not panic

Additional context

Follow on for #2453 which is fixed by #2631

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions