-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
@andygrove I was doing some profiling and noticed a potential performance improvement described below
NOTE: The issue described below would be irrelevant if it was possible to use scalar comparison operations in DataFusion as described here:
https://issues.apache.org/jira/browse/ARROW-8907
the build_literal_array function defined here https://github.com/apache/arrow/blob/master/rust/datafusion/src/execution/physical_plan/expressions.rs#L1204
creates an array of literal values using a loop, but from benchmarks it appears creating an array from vec is much faster
(about 58 times faster when building an array with 100000 values).
Here are the benchmark results:
array builder/array from vec: time: [25.644 us 25.883 us 26.214 us]
array builder/array from values: time: [1.4985 ms 1.5090 ms 1.5213 ms]
here is the benchmark code:
fn bench_array_builder(c: &mut Criterion) {
let array_len = 100000;
let mut count = 0;
let mut group = c.benchmark_group("array builder");
group.bench_function("array from vec", |b| b.iter(|| {
let float_array: PrimitiveArray<Float32Type> = vec![1.0; array_len].into();
count = float_array.len();
}));
println!("built array with {} values", count);
group.bench_function("array from values", |b| b.iter(|| {
// let float_array: PrimitiveArray<Float32Type> = build_literal_array(1.0, array_len);
let mut builder = PrimitiveBuilder::<Float32Type>::new(array_len);
for _ in 0..count {
&builder.append_value(1.0);
}
let float_array = builder.finish();
count = float_array.len();
}));
println!("built array with {} values", count);
}
Reporter: Yordan Pavlov / @yordan-pavlov
Related issues:
- [Rust][DataFusion] Improve performance of equality to a constant predicate support (is related to)
- [Rust] [Datafusion] Optimize literal expression evaluation (is related to)
Note: This issue was originally created as ARROW-8908. Please see the migration documentation for further details.