Skip to content

[Rust][DataFusion] improve performance of building literal arrays #25041

@asfimport

Description

@asfimport

@andygrove  I was doing some profiling and noticed a potential performance improvement described below

NOTE: The issue described below would be irrelevant if it was possible to use scalar comparison operations in DataFusion as described here:
https://issues.apache.org/jira/browse/ARROW-8907

the build_literal_array function defined here https://github.com/apache/arrow/blob/master/rust/datafusion/src/execution/physical_plan/expressions.rs#L1204
creates an array of literal values using a loop, but from benchmarks it appears creating an array from vec is much faster
(about 58 times faster when building an array with 100000 values).
Here are the benchmark results:

array builder/array from vec: time: [25.644 us 25.883 us 26.214 us]
array builder/array from values: time: [1.4985 ms 1.5090 ms 1.5213 ms]

here is the benchmark code:

fn bench_array_builder(c: &mut Criterion) {
 let array_len = 100000;
 let mut count = 0;
 let mut group = c.benchmark_group("array builder");

group.bench_function("array from vec", |b| b.iter(|| {
 let float_array: PrimitiveArray<Float32Type> = vec![1.0; array_len].into();
 count = float_array.len();
 }));
 println!("built array with {} values", count);

group.bench_function("array from values", |b| b.iter(|| {
 // let float_array: PrimitiveArray<Float32Type> = build_literal_array(1.0, array_len);
 let mut builder = PrimitiveBuilder::<Float32Type>::new(array_len);
 for _ in 0..count {
 &builder.append_value(1.0);
 }
 let float_array = builder.finish();
 count = float_array.len();
 }));
 println!("built array with {} values", count);
}

Reporter: Yordan Pavlov / @yordan-pavlov

Related issues:

Note: This issue was originally created as ARROW-8908. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions